Skip to content

Dataset

Base class for all datasets.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

NONE

BaseDataset(version=None, hash=None, name=None, description=None, schema=None, additional_metadata=None)

Bases: ABC, Iterable

Base class for all datasets.

Attributes:

Name Type Description
dataset list[MetricInput]

The dataset to evaluate.

version str | None

The version of the dataset.

hash str | None

The hash of the dataset.

name str | None

The name of the dataset.

description str | None

The description of the dataset.

schema type[BaseModel] | None

The schema of the dataset.

additional_metadata dict[str, Any] | None

Additional metadata of the dataset.

Initialize the dataset.

Parameters:

Name Type Description Default
version str | None

The version of the dataset. Defaults to None.

None
hash str | None

The hash of the dataset. Defaults to None.

None
name str | None

The name of the dataset. Defaults to None.

None
description str | None

The description of the dataset. Defaults to None.

None
schema type[BaseModel] | None

The schema of the dataset. Defaults to None.

None
additional_metadata dict[str, Any] | None

Additional metadata of the dataset. Defaults to None.

None

__getitem__(index)

Get the item at the given index.

Parameters:

Name Type Description Default
index int

The index of the item to get.

required

Returns:

Type Description
MetricInput | list[MetricInput]

MetricInput | list[MetricInput]: The item at the given index or a list of items if the index is a list.

__iter__()

Iterate over the dataset.

Returns:

Type Description
Iterator[MetricInput]

Iterator[MetricInput]: An iterator over the dataset.

__len__()

Get the length of the dataset.

Returns:

Name Type Description
int int

The length of the dataset.

filter(filter_fn)

Filter the dataset.

Parameters:

Name Type Description Default
filter_fn Callable[[MetricInput], bool]

The filter function.

required

load() abstractmethod

Load the dataset.

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The loaded dataset.

Raises:

Type Description
NotImplementedError

If the load method is not implemented.

map(map_fn)

Map the dataset.

Parameters:

Name Type Description Default
map_fn Callable[[MetricInput], MetricInput]

The map function.

required

sample(n=3)

Sample n items from the dataset.

Parameters:

Name Type Description Default
n int

The number of items to sample.

3

Returns:

Type Description
list[MetricInput]

list[MetricInput]: The sampled items.

suffle()

Shuffle the dataset.

validate() abstractmethod

Validate the dataset.

Raises:

Type Description
NotImplementedError

If the validate method is not implemented.