Data Loader
Module for data loading and handling.
This module provides a registry for managing different data source loaders and utility functions for accessing them.
Reviewer
- Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)
References
NONE
DataLoaderRegistry
Registry for available data loaders.
get_loader(name)
classmethod
Get an instantiated data loader by name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
Name of the data loader |
required |
Returns:
| Type | Description |
|---|---|
BaseDataLoader
|
An instance of the data loader |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the loader name is not recognized |
list_available_loaders()
classmethod
List all registered data loader names.
Returns:
| Type | Description |
|---|---|
list[str]
|
A list of data loader names. |
register(name, loader_class)
classmethod
Register a data loader class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
Identifier for the loader type |
required |
loader_class |
Type[BaseDataLoader]
|
The data loader class |
required |
get_data_loader(source_type)
Get a data loader instance by source type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source_type |
str
|
The type of data source |
required |
Returns:
| Type | Description |
|---|---|
BaseDataLoader
|
A data loader instance |
load_data(experiment_args, cache_timeout=GeneralConstants.CACHE_TIMEOUT)
Load data based on experiment arguments.
This function automatically determines the appropriate data loader to use based on the provided experiment arguments and returns the training, validation, and prompt datasets.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_args |
ExperimentConfig
|
The experiment configuration arguments |
required |
cache_timeout |
Optional[int]
|
The time in seconds for which data should be cached. Defaults to CACHE_TIMEOUT (5 minutes). Set to None or 0 to disable caching. |
CACHE_TIMEOUT
|
Returns:
| Type | Description |
|---|---|
|
tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]: A tuple containing: - Training dataset (required) - Validation dataset (optional, empty DataFrame if not provided) - Prompt dataset (required) |
Raises:
| Type | Description |
|---|---|
DataSourceNotProvidedError
|
If no valid data source is specified |