Dataset adapter
Dataset Adapter Module.
This module is used to detect the dataset type.
References
NONE
DatasetConfig(type, prefix=None, extensions=None, required_params=None, factory_func=None, name_extractor=None)
dataclass
Configuration for dataset detection and creation.
DatasetRegistry()
Registry for dataset configurations.
Initialize the dataset registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
_configs |
Dict[DatasetType, DatasetConfig]
|
The dictionary of dataset configurations. |
required |
_prefix_map |
Dict[str, DatasetType]
|
The dictionary of dataset prefixes. |
required |
_extension_map |
Dict[str, DatasetType]
|
The dictionary of dataset extensions. |
required |
detect_type(dataset)
Detect the dataset type from the dataset string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
str
|
The dataset string to detect. |
required |
Returns:
| Type | Description |
|---|---|
Optional[DatasetType]
|
Optional[DatasetType]: The detected dataset type, or None if not supported. |
get_config(dataset_type)
Get configuration for a dataset type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_type |
DatasetType
|
The dataset type to get the configuration for. |
required |
Returns:
| Type | Description |
|---|---|
Optional[DatasetConfig]
|
Optional[DatasetConfig]: The configuration for the dataset type, or None if not found. |
register_config(config)
Register a dataset configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config |
DatasetConfig
|
The dataset configuration to register. |
required |
DatasetType
Bases: Enum
Enumeration of supported dataset types.
create_dataset(dataset, dataset_type, **kwargs)
Create a dataset instance based on the detected type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
str
|
The dataset string. |
required |
dataset_type |
DatasetType
|
The detected dataset type. |
required |
**kwargs |
Any
|
Additional arguments for dataset creation. |
{}
|
Returns:
| Type | Description |
|---|---|
Tuple[BaseDataset, str]
|
Tuple[BaseDataset, str]: The created dataset and its name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset type is not supported or required parameters are missing. |
detect_dataset(dataset, **kwargs)
Detect the dataset type and create an instance.
Supported dataset string format
- hf/
(Hugging Face dataset) - langfuse/
(Langfuse dataset) Required parameters: langfuse_client - gs/
(Google Sheets dataset) Required parameters: sheet_id, client_email, private_key (JSONL file) (CSV file)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
str | BaseDataset
|
The dataset to detect. |
required |
**kwargs |
Any
|
Additional arguments to pass to the dataset constructor. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
BaseDataset |
BaseDataset
|
The detected dataset. |
str |
str
|
The dataset name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset is not supported. |
detect_dataset_type(dataset)
Detect the dataset type from the dataset string.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
str
|
The dataset string to detect. |
required |
Returns:
| Type | Description |
|---|---|
Optional[DatasetType]
|
Optional[DatasetType]: The detected dataset type, or None if not supported. |
register_dataset_type(config)
Register a new dataset type configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config |
DatasetConfig
|
The configuration for the new dataset type. |
required |