Skip to content

Runner

Init file for the Runner module.

BaseRunner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)

Bases: ABC

Abstract class for runner.

This class defines the interface for all runner.

Attributes:

Name Type Description
data str | BaseDataset

The data to evaluate.

inference_fn Callable

The inference function to use.

evaluators list[BaseEvaluator]

The evaluators to use.

experiment_tracker BaseExperimentTracker | None

The experiment tracker.

Initialize the runner.

Parameters:

Name Type Description Default
data str | BaseDataset

The data to evaluate.

required
inference_fn Callable

The inference function to use.

required
evaluators list[BaseEvaluator]

The evaluators to use.

required
experiment_tracker BaseExperimentTracker | None

The experiment tracker.

None
**kwargs Any

Additional configuration parameters.

{}

evaluate(dataset) async

Run the evaluator on the dataset.

The dataset is evaluated in batches of the given batch size.

Parameters:

Name Type Description Default
dataset BaseDataset

The dataset to run the evaluator on.

required

Returns:

Name Type Description
BaseDataset BaseDataset

The dataset with the evaluator results.

Runner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)

Bases: BaseRunner

Runner class for evaluating datasets.

Attributes:

Name Type Description
data str | BaseDataset

The data to evaluate.

inference_fn Callable

The inference function to use.

evaluators list[BaseEvaluator]

The evaluators to use.

experiment_tracker ExperimentTrackerAdapter | None

The experiment tracker adapter.

**kwargs ExperimentTrackerAdapter | None

Additional configuration parameters.

Initialize the Runner.

Parameters:

Name Type Description Default
data str | BaseDataset

The data to evaluate.

required
inference_fn Callable

The inference function to use.

required
evaluators list[BaseEvaluator]

The evaluators to use.

required
experiment_tracker BaseExperimentTracker | None

The experiment tracker (will be wrapped in adapter if needed).

None
batch_size int

The batch size to use for evaluation.

10
**kwargs Any

Additional configuration parameters.

{}

evaluate() async

Run the evaluators on the dataset.

The dataset is evaluated in batches of the given batch size. Automatically detects and uses the appropriate logging pattern: - Context-based logging for trackers that support log_context (like Langfuse) - Batch logging for other trackers

Returns:

Type Description
list[list[EvaluationOutput]]

list[list[EvaluationOutput]]: A list of lists containing evaluation results from all evaluators.

get_run_results(**kwargs)

Get the results of a run.

Parameters:

Name Type Description Default
run_id str

The ID of the run.

required
**kwargs Any

Additional configuration parameters.

{}

Returns:

Type Description
list[dict[str, Any]]

list[dict[str, Any]]: The results of the run.