Ensemble calculator
Ensemble Calculator for Multiple LLM Judges.
This module provides utilities for aggregating results from multiple LLM judges using ensemble methods and calculating statistical measures of agreement.
References
NONE
EnsembleCalculator(ensemble_method=EnsembleMethod.MEDIAN, weights=None)
Calculates ensemble statistics from multiple LLM judge results.
This class handles the aggregation of evaluation results from multiple LLM judges using various ensemble methods and provides statistical measures of judge agreement.
Attributes:
| Name | Type | Description |
|---|---|---|
ensemble_method |
EnsembleMethod
|
The method to use for aggregating scores. |
weights |
Optional[List[float]]
|
Optional weights for weighted ensemble methods. |
Initialize the EnsembleCalculator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ensemble_method |
EnsembleMethod
|
The ensemble method to use. Defaults to EnsembleMethod.MEDIAN. |
MEDIAN
|
weights |
Optional[List[float]]
|
Weights for each judge. If None, defaults to equal weights (1.0 for each judge). Defaults to None. |
None
|
calculate_ensemble_result(judge_results)
Calculate ensemble result from multiple judge evaluations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
judge_results |
List[Dict[str, Any]]
|
List of evaluation results from each judge. Each result should contain either: - 'relevancy_rating' key (GenerationEvaluator) with categorical values - 'score' key (QTEvaluator) with numeric values |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict[str, Any]: Ensemble result containing aggregated scores and statistics. |
EnsembleMethod
Bases: StrEnum
Enumeration of ensemble methods for aggregating judge results.
Attributes:
| Name | Type | Description |
|---|---|---|
MEDIAN |
str
|
Use median of all judge scores. |
AVERAGE_ROUNDED |
str
|
Use rounded average of all judge scores. |