Evaluate
Evaluate Module.
This module is used to evaluate the model using a convenience function.
References
NONE
evaluate(data, inference_fn, evaluators, experiment_tracker=None, batch_size=10, allow_batch_evaluation=False, **kwargs)
async
Evaluate the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
str | BaseDataset
|
The data to evaluate. |
required |
inference_fn
|
Callable
|
The inference function to use. |
required |
evaluators
|
list[BaseEvaluator | BaseMetric]
|
The evaluators to use. |
required |
experiment_tracker
|
BaseExperimentTracker | None
|
The experiment tracker to use. |
None
|
batch_size
|
int
|
The batch size to use for evaluation (runner-level chunking for memory management). |
10
|
allow_batch_evaluation
|
bool
|
Enable batch processing mode for LLM API calls. When True, the runner passes entire chunks to evaluators for batch processing. Defaults to False. |
False
|
**kwargs
|
Any
|
Additional configuration parameters. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
EvaluationResult |
EvaluationResult
|
Structured result containing evaluation results and experiment URLs/paths. |