Runner
Init file for the Runner module.
BaseRunner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)
Bases: ABC
Abstract class for runner.
This class defines the interface for all runner.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
inference_fn |
Callable
|
The inference function to use. |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
experiment_tracker |
BaseExperimentTracker | None
|
The experiment tracker. |
Initialize the runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
required |
inference_fn |
Callable
|
The inference function to use. |
required |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
required |
experiment_tracker |
BaseExperimentTracker | None
|
The experiment tracker. |
None
|
**kwargs |
Any
|
Additional configuration parameters. |
{}
|
evaluate(dataset)
async
Run the evaluator on the dataset.
The dataset is evaluated in batches of the given batch size.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset |
BaseDataset
|
The dataset to run the evaluator on. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
BaseDataset |
BaseDataset
|
The dataset with the evaluator results. |
Runner(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, **kwargs)
Bases: BaseRunner
Runner class for evaluating datasets.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
inference_fn |
Callable
|
The inference function to use. |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
experiment_tracker |
ExperimentTrackerAdapter | None
|
The experiment tracker adapter. |
**kwargs |
ExperimentTrackerAdapter | None
|
Additional configuration parameters. |
Initialize the Runner.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
str | BaseDataset
|
The data to evaluate. |
required |
inference_fn |
Callable
|
The inference function to use. |
required |
evaluators |
list[BaseEvaluator]
|
The evaluators to use. |
required |
experiment_tracker |
BaseExperimentTracker | None
|
The experiment tracker (will be wrapped in adapter if needed). |
None
|
batch_size |
int
|
The batch size to use for evaluation. |
10
|
**kwargs |
Any
|
Additional configuration parameters. |
{}
|
evaluate()
async
Run the evaluators on the dataset.
The dataset is evaluated in batches of the given batch size. Automatically detects and uses the appropriate logging pattern: - Context-based logging for trackers that support log_context (like Langfuse) - Batch logging for other trackers
Returns:
| Type | Description |
|---|---|
list[list[EvaluationOutput]]
|
list[list[EvaluationOutput]]: A list of lists containing evaluation results from all evaluators. |
get_run_results(**kwargs)
Get the results of a run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
run_id |
str
|
The ID of the run. |
required |
**kwargs |
Any
|
Additional configuration parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: The results of the run. |