Skip to content

Evaluate

Evaluate Module.

This module is used to evaluate the model using a convenience function.

evaluate(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, allow_batch_evaluation=False, summary_evaluators=None, **kwargs) async

Evaluate the model.

Parameters:

Name Type Description Default
data str | BaseDataset

The data to evaluate.

required
inference_fn Callable

The inference function to use.

required
evaluators list[BaseEvaluator | BaseMetric]

The evaluators to use.

required
experiment_tracker BaseExperimentTracker | None

The experiment tracker to use.

None
batch_size int

The batch size to use for evaluation (runner-level chunking for memory management).

10
allow_batch_evaluation bool

Enable batch processing mode for LLM API calls. When True, the runner passes entire chunks to evaluators for batch processing. Defaults to False.

False
summary_evaluators list[SummaryEvaluatorCallable] | None

Custom summary evaluators to compute batch-level statistics. Each callable receives (evaluation_results, data) and returns a dict of summary metrics. Defaults to None.

None
**kwargs Any

Additional configuration parameters.

{}

Returns:

Name Type Description
EvaluationResult EvaluationResult

Structured result containing evaluation results and experiment URLs/paths.