Dataset
Dataset init file.
BaseDataset(version=None, hash=None, name=None, description=None, schema=None, additional_metadata=None)
Bases: ABC, Iterable
Base class for all datasets.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to evaluate. |
version |
str | None
|
The version of the dataset. |
hash |
str | None
|
The hash of the dataset. |
name |
str | None
|
The name of the dataset. |
description |
str | None
|
The description of the dataset. |
schema |
type[BaseModel] | None
|
The schema of the dataset. |
additional_metadata |
dict[str, Any] | None
|
Additional metadata of the dataset. |
Initialize the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
version |
str | None
|
The version of the dataset. Defaults to None. |
None
|
hash |
str | None
|
The hash of the dataset. Defaults to None. |
None
|
name |
str | None
|
The name of the dataset. Defaults to None. |
None
|
description |
str | None
|
The description of the dataset. Defaults to None. |
None
|
schema |
type[BaseModel] | None
|
The schema of the dataset. Defaults to None. |
None
|
additional_metadata |
dict[str, Any] | None
|
Additional metadata of the dataset. Defaults to None. |
None
|
__getitem__(index)
Get the item at the given index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index |
int
|
The index of the item to get. |
required |
Returns:
| Type | Description |
|---|---|
MetricInput | list[MetricInput]
|
MetricInput | list[MetricInput]: The item at the given index or a list of items if the index is a list. |
__iter__()
Iterate over the dataset.
Returns:
| Type | Description |
|---|---|
Iterator[MetricInput]
|
Iterator[MetricInput]: An iterator over the dataset. |
__len__()
Get the length of the dataset.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The length of the dataset. |
filter(filter_fn)
Filter the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filter_fn |
Callable[[MetricInput], bool]
|
The filter function. |
required |
load()
abstractmethod
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the load method is not implemented. |
map(map_fn)
Map the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
map_fn |
Callable[[MetricInput], MetricInput]
|
The map function. |
required |
sample(n=3)
Sample n items from the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n |
int
|
The number of items to sample. |
3
|
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The sampled items. |
shuffle()
Shuffle the dataset.
validate()
abstractmethod
Validate the dataset.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the validate method is not implemented. |
DatasetRegistry()
Registry for dataset configurations.
Initialize the dataset registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_configs |
Dict[DatasetType, DatasetConfig]
|
The dictionary of dataset configurations. |
required |
_prefix_map |
Dict[str, DatasetType]
|
The dictionary of dataset prefixes. |
required |
_extension_map |
Dict[str, DatasetType]
|
The dictionary of dataset extensions. |
required |
detect_type(dataset)
Detect the dataset type from the dataset string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
str
|
The dataset string to detect. |
required |
Returns:
| Type | Description |
|---|---|
Optional[DatasetType]
|
Optional[DatasetType]: The detected dataset type, or None if not supported. |
get_config(dataset_type)
Get configuration for a dataset type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_type |
DatasetType
|
The dataset type to get the configuration for. |
required |
Returns:
| Type | Description |
|---|---|
Optional[DatasetConfig]
|
Optional[DatasetConfig]: The configuration for the dataset type, or None if not found. |
register_config(config)
Register a dataset configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config |
DatasetConfig
|
The dataset configuration to register. |
required |
DictDataset(dataset, name=None)
Bases: BaseDataset
Dict-Based Dataset.
This class is a subclass of the BaseDataset class. It is used to store a dataset in a dictionary format.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[dict]
|
The dataset to evaluate. |
Initialize the DictDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
Dataset
|
The dataset to use for the evaluation. |
required |
from_csv(path, name=None, **kwargs)
classmethod
Load a dataset from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the CSV file. |
required |
name |
str | None
|
The name of the dataset. |
None
|
**kwargs |
Any
|
Additional arguments to pass to pandas read_csv. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded dataset. |
from_jsonl(path, **kwargs)
classmethod
Load a dataset from a JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the JSONL file. |
required |
**kwargs |
Any
|
Additional arguments to pass to the constructor. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded dataset. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput. |
HuggingFaceDataset(dataset)
Bases: BaseDataset
Hugging Face dataset class for the evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to use for the evaluation. |
Initialize the HuggingFaceDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
Dataset
|
The dataset to use for the evaluation. |
required |
from_hub(path_or_name, split, **kwargs)
staticmethod
Create a HuggingFaceDataset from a Hugging Face dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path_or_name |
str
|
The path or name of the dataset. |
required |
split |
str
|
The split of the dataset. |
required |
**kwargs |
Any
|
Additional arguments to pass to the load function. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
HuggingFaceDataset |
HuggingFaceDataset
|
The created dataset. |
from_list(dataset)
staticmethod
Create a HuggingFaceDataset from a list of MetricInput.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
list[MetricInput]
|
The dataset to create. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
HuggingFaceDataset |
HuggingFaceDataset
|
The created dataset. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput. |
LangfuseDataset(dataset, langfuse_client, dataset_name=None, expected_output_key='expected_response', mapping=None)
Bases: BaseDataset
Langfuse dataset class for the evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to use for the evaluation. |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
dataset_name |
str
|
The name of the dataset in Langfuse. |
expected_output_key |
str | None
|
The key for expected output. Defaults to "expected_response". |
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
Initialize the LangfuseDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
List[MetricInput]
|
The dataset to use for the evaluation. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
Optional[str]
|
The name of the dataset in Langfuse. |
None
|
expected_output_key |
str | None
|
The key for expected output. Defaults to "expected_response". |
'expected_response'
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
convert_to_standard_dataset(expected_output_key=None, mapping=None)
Convert the dataset to standard data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
expected_output_key |
str | None
|
The key for expected output. Defaults to None. |
None
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
List[MetricInput]
|
List[MetricInput]: The converted dataset. |
from_csv(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, **kwargs)
staticmethod
Create a LangfuseDataset from a CSV file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the CSV file. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name to register this dataset under in Langfuse. If None, defaults to the CSV filename without extension. Defaults to None. |
None
|
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
**kwargs |
Any
|
Additional arguments to pass to the constructor. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_dict(dataset, langfuse_client, dataset_name, dataset_description='', mapping=None, metadata=None, is_append=False)
staticmethod
Create a LangfuseDataset from a list of MetricInput.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
List[MetricInput]
|
The dataset to create. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
required |
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_gsheets(sheet_id, worksheet_name, client_email, private_key, langfuse_client, dataset_name=None, dataset_description='', mapping=None, metadata=None, is_append=False)
async
staticmethod
Create a LangfuseDataset from Google Sheets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sheet_id |
str
|
The ID of the Google Sheet. |
required |
worksheet_name |
str
|
The name of the worksheet within the Google Sheet. |
required |
client_email |
str
|
The client email for Google Sheets API. |
required |
private_key |
str
|
Base64-encoded private key for Google Sheets API. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
None
|
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_jsonl(path, langfuse_client, dataset_name=None, dataset_description='', metadata=None, is_append=False, **kwargs)
staticmethod
Create a LangfuseDataset from a JSONL file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path |
str
|
The path to the JSONL file. |
required |
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
None
|
dataset_description |
str
|
The description of the dataset. Defaults to an empty string. |
''
|
metadata |
dict
|
Optional metadata for the dataset. Defaults to None. |
None
|
is_append |
bool
|
If True, append items to existing dataset. If False, only create if dataset doesn't exist. |
False
|
**kwargs |
Any
|
Additional arguments to pass to the constructor. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The created dataset. |
from_langfuse(langfuse_client, dataset_name, mapping=None)
staticmethod
Load a dataset from Langfuse.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
langfuse_client |
Langfuse
|
The Langfuse client instance. |
required |
dataset_name |
str
|
The name of the dataset in Langfuse. |
required |
mapping |
dict[str, Any] | None
|
Optional mapping for field keys. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LangfuseDataset |
LangfuseDataset
|
The loaded dataset. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not found or has no data. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
List[MetricInput]
|
List[MetricInput]: The loaded dataset with proper Langfuse structure. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput or if required fields are missing. |
SpreadsheetDataset(dataset)
Bases: BaseDataset
Spreadsheet dataset class for the evaluator.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset |
list[MetricInput]
|
The dataset to use for the evaluation. |
Initialize the SpreadsheetDataset class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
Dataset
|
The dataset to use for the evaluation. |
required |
from_gsheets(sheet_id, worksheet_name, client_email, private_key)
async
staticmethod
Load the dataset from Google Sheets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sheet_id |
str
|
The ID of the Google Sheet. |
required |
worksheet_name |
str
|
The name of the worksheet within the Google Sheet. |
required |
client_email |
str
|
The client email for Google Sheets API. |
required |
private_key |
str
|
Base64-encoded private key for Google Sheets API. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SpreadsheetDataset |
SpreadsheetDataset
|
The loaded dataset. |
load()
Load the dataset.
Returns:
| Type | Description |
|---|---|
list[MetricInput]
|
list[MetricInput]: The loaded dataset. |
validate()
Validate the dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not a list of MetricInput. |
load_simple_agent_dataset()
Load the simple agent dataset from the local CSV file.
The dataset contains agent interaction data with trajectories, questions, and responses, suitable for both agent trajectory evaluation and generation quality evaluation.
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded simple agent dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_response', 'agent_trajectory', 'expected_agent_trajectory'. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the CSV file doesn't exist. |
ValueError
|
If the CSV file is empty or malformed. |
load_simple_qa_dataset()
Load the simple QA dataset from the local CSV file.
The dataset contains question-answer pairs with generated responses and contexts, suitable for RAG evaluation and testing.
Returns:
| Name | Type | Description |
|---|---|---|
DictDataset |
DictDataset
|
The loaded simple QA dataset containing MetricInput dictionaries with keys: 'query', 'generated_response', 'expected_output', 'retrieved_context'. |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the CSV file doesn't exist. |
ValueError
|
If the CSV file is empty or malformed. |