Skip to content

Dataset

Dataset init file.

BaseDataset(version=None, hash=None, name=None, description=None, schema=None, additional_metadata=None)

Bases: ABC, Iterable

Base class for all datasets.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to evaluate.

version str | None

The version of the dataset.

hash str | None

The hash of the dataset.

name str | None

The name of the dataset.

description str | None

The description of the dataset.

schema type[BaseModel] | None

The schema of the dataset.

additional_metadata dict[str, Any] | None

Additional metadata of the dataset.

Initialize the dataset.

Parameters:

Name Type Description Default
version str | None

The version of the dataset. Defaults to None.

None
hash str | None

The hash of the dataset. Defaults to None.

None
name str | None

The name of the dataset. Defaults to None.

None
description str | None

The description of the dataset. Defaults to None.

None
schema type[BaseModel] | None

The schema of the dataset. Defaults to None.

None
additional_metadata dict[str, Any] | None

Additional metadata of the dataset. Defaults to None.

None

__getitem__(index)

Get the item at the given index.

Parameters:

Name Type Description Default
index int

The index of the item to get.

required

Returns:

Type Description
MetricInput | list[MetricInput]

MetricInput | list[MetricInput]: The item at the given index or a list of items if the index is a list.

__iter__()

Iterate over the dataset.

Returns:

Type Description
Iterator[MetricInput]

Iterator[MetricInput]: An iterator over the dataset.

__len__()

Get the length of the dataset.

Returns:

Name Type Description
int int

The length of the dataset.

filter(filter_fn)

Filter the dataset.

Parameters:

Name Type Description Default
filter_fn Callable[[MetricInput], bool]

The filter function.

required

load() abstractmethod

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

Raises:

Type Description
NotImplementedError

If the load method is not implemented.

map(map_fn)

Map the dataset.

Parameters:

Name Type Description Default
map_fn Callable[[MetricInput], MetricInput]

The map function.

required

sample(n=3)

Sample n items from the dataset.

Parameters:

Name Type Description Default
n int

The number of items to sample.

3

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The sampled items.

shuffle()

Shuffle the dataset.

validate() abstractmethod

Validate the dataset.

Raises:

Type Description
NotImplementedError

If the validate method is not implemented.

DatasetRegistry()

Registry for dataset configurations.

Initialize the dataset registry.

Parameters:

Name Type Description Default
_configs Dict[DatasetType, DatasetConfig]

The dictionary of dataset configurations.

required
_prefix_map Dict[str, DatasetType]

The dictionary of dataset prefixes.

required
_extension_map Dict[str, DatasetType]

The dictionary of dataset extensions.

required

detect_type(dataset)

Detect the dataset type from the dataset string.

Parameters:

Name Type Description Default
dataset str

The dataset string to detect.

required

Returns:

Type Description
Optional[DatasetType]

Optional[DatasetType]: The detected dataset type, or None if not supported.

get_config(dataset_type)

Get configuration for a dataset type.

Parameters:

Name Type Description Default
dataset_type DatasetType

The dataset type to get the configuration for.

required

Returns:

Type Description
Optional[DatasetConfig]

Optional[DatasetConfig]: The configuration for the dataset type, or None if not found.

register_config(config)

Register a dataset configuration.

Parameters:

Name Type Description Default
config DatasetConfig

The dataset configuration to register.

required

DictDataset(dataset, name=None)

Bases: BaseDataset

Dict-Based Dataset.

This class is a subclass of the BaseDataset class. It is used to store a dataset in a dictionary format.

Attributes:

Name Type Description
dataset list[dict]

The dataset to evaluate.

Initialize the DictDataset class.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to use for the evaluation.

required

from_csv(path, name=None, **kwargs) classmethod

Load a dataset from a CSV file.

Parameters:

Name Type Description Default
path str

The path to the CSV file.

required
name str | None

The name of the dataset.

None
**kwargs Any

Additional arguments to pass to pandas read_csv.

{}

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

from_jsonl(path, **kwargs) classmethod

Load a dataset from a JSONL file.

Parameters:

Name Type Description Default
path str

The path to the JSONL file.

required
**kwargs Any

Additional arguments to pass to the constructor.

{}

Returns:

Name Type Description
DictDataset DictDataset

The loaded dataset.

load()

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput.

HuggingFaceDataset(dataset)

Bases: BaseDataset

Hugging Face dataset class for the evaluator.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to use for the evaluation.

Initialize the HuggingFaceDataset class.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to use for the evaluation.

required

from_hub(path_or_name, split, **kwargs) staticmethod

Create a HuggingFaceDataset from a Hugging Face dataset.

Parameters:

Name Type Description Default
path_or_name str

The path or name of the dataset.

required
split str

The split of the dataset.

required
**kwargs Any

Additional arguments to pass to the load function.

{}

Returns:

Name Type Description
HuggingFaceDataset HuggingFaceDataset

The created dataset.

from_list(dataset) staticmethod

Create a HuggingFaceDataset from a list of MetricInput.

Parameters:

Name Type Description Default
dataset list[MetricInput]

The dataset to create.

required

Returns:

Name Type Description
HuggingFaceDataset HuggingFaceDataset

The created dataset.

load()

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput.

LangfuseDataset(dataset, langfuse_client, dataset_name=None, expected_output_key='expected_response', mapping=None)

Bases: BaseDataset

Langfuse dataset class for the evaluator.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to use for the evaluation.

langfuse_client Langfuse

The Langfuse client instance.

dataset_name str

The name of the dataset in Langfuse.

expected_output_key str | None

The key for expected output. Defaults to "expected_response".

mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

Initialize the LangfuseDataset class.

Parameters:

Name Type Description Default
dataset List[MetricInput]

The dataset to use for the evaluation.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name Optional[str]

The name of the dataset in Langfuse.

None
expected_output_key str | None

The key for expected output. Defaults to "expected_response".

'expected_response'
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None

convert_to_standard_dataset(expected_output_key=None, mapping=None)

Convert the dataset to standard data.

Parameters:

Name Type Description Default
expected_output_key str | None

The key for expected output. Defaults to None.

None
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None

Returns:

Type Description
List[MetricInput]

List[MetricInput]: The converted dataset.

from_csv(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, **kwargs) staticmethod

Create a LangfuseDataset from a CSV file.

Parameters:

Name Type Description Default
path str

The path to the CSV file.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name to register this dataset under in Langfuse. If None, defaults to the CSV filename without extension. Defaults to None.

None
dataset_description str

The description of the dataset. Defaults to an empty string.

''
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False
**kwargs Any

Additional arguments to pass to the constructor.

{}

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_dict(dataset, langfuse_client, dataset_name, dataset_description='', mapping=None, metadata=None, is_append=False) staticmethod

Create a LangfuseDataset from a list of MetricInput.

Parameters:

Name Type Description Default
dataset List[MetricInput]

The dataset to create.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

required
dataset_description str

The description of the dataset. Defaults to an empty string.

''
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_gsheets(sheet_id, worksheet_name, client_email, private_key, langfuse_client, dataset_name=None, dataset_description='', mapping=None, metadata=None, is_append=False) async staticmethod

Create a LangfuseDataset from Google Sheets.

Parameters:

Name Type Description Default
sheet_id str

The ID of the Google Sheet.

required
worksheet_name str

The name of the worksheet within the Google Sheet.

required
client_email str

The client email for Google Sheets API.

required
private_key str

Base64-encoded private key for Google Sheets API.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

None
dataset_description str

The description of the dataset. Defaults to an empty string.

''
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_jsonl(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, **kwargs) staticmethod

Create a LangfuseDataset from a JSONL file.

Parameters:

Name Type Description Default
path str

The path to the JSONL file.

required
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

None
dataset_description str

The description of the dataset. Defaults to an empty string.

''
metadata dict

Optional metadata for the dataset. Defaults to None.

None
is_append bool

If True, append items to existing dataset. If False, only create if dataset doesn't exist.

False
**kwargs Any

Additional arguments to pass to the constructor.

{}

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The created dataset.

from_langfuse(langfuse_client, dataset_name, mapping=None) staticmethod

Load a dataset from Langfuse.

Parameters:

Name Type Description Default
langfuse_client Langfuse

The Langfuse client instance.

required
dataset_name str

The name of the dataset in Langfuse.

required
mapping dict[str, Any] | None

Optional mapping for field keys. Defaults to None.

None

Returns:

Name Type Description
LangfuseDataset LangfuseDataset

The loaded dataset.

Raises:

Type Description
ValueError

If the dataset is not found or has no data.

load()

Load the dataset.

Returns:

Type Description
List[MetricInput]

List[MetricInput]: The loaded dataset with proper Langfuse structure.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput or if required fields are missing.

SpreadsheetDataset(dataset)

Bases: BaseDataset

Spreadsheet dataset class for the evaluator.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to use for the evaluation.

Initialize the SpreadsheetDataset class.

Parameters:

Name Type Description Default
dataset Dataset

The dataset to use for the evaluation.

required

from_gsheets(sheet_id, worksheet_name, client_email, private_key) async staticmethod

Load the dataset from Google Sheets.

Parameters:

Name Type Description Default
sheet_id str

The ID of the Google Sheet.

required
worksheet_name str

The name of the worksheet within the Google Sheet.

required
client_email str

The client email for Google Sheets API.

required
private_key str

Base64-encoded private key for Google Sheets API.

required

Returns:

Name Type Description
SpreadsheetDataset SpreadsheetDataset

The loaded dataset.

load()

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

validate()

Validate the dataset.

Raises:

Type Description
ValueError

If the dataset is not a list of MetricInput.

load_simple_agent_dataset()

Load the simple agent dataset from the local CSV file.

The dataset contains agent interaction data with trajectories, questions, and responses, suitable for both agent trajectory evaluation and generation quality evaluation.

Returns:

Name Type Description
DictDataset DictDataset

The loaded simple agent dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_response', 'agent_trajectory', 'expected_agent_trajectory'.

Raises:

Type Description
FileNotFoundError

If the CSV file doesn't exist.

ValueError

If the CSV file is empty or malformed.

load_simple_qa_dataset()

Load the simple QA dataset from the local CSV file.

The dataset contains question-answer pairs with generated responses and contexts, suitable for RAG evaluation and testing.

Returns:

Name Type Description
DictDataset DictDataset

The loaded simple QA dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_output', 'retrieved_context'.

Raises:

Type Description
FileNotFoundError

If the CSV file doesn't exist.

ValueError

If the CSV file is empty or malformed.