Types
Types for the evaluator.
This module contains the types for the evaluator.
References
NONE
EvaluationOutput = Mapping[str, Any]
module-attribute
Constant variables
MetricInput = Mapping[str, Any]
module-attribute
Evaluation Output Type.
This is the type of the output of the evaluation function.
EvaluationOutput is a mapping of dataset field names to dataset field values.
MetricValue = float | int | str
module-attribute
Metric Output and Input Types.
This is the type of the output and input of a metric.
MetricOutput is a mapping of metric names to MetricOutputChild. MetricInput is a mapping of dataset field names to dataset field values.
AgentData
Bases: TypedDict
Agent data.
A data for agent evaluation such as AgentEvals, etc.
Attributes:
Name | Type | Description |
---|---|---|
agent_trajectory |
list[dict[str, Any]]
|
The agent trajectory. |
expected_agent_trajectory |
list[dict[str, Any]] | None
|
The expected agent trajectory. |
MetricResult
Bases: BaseModel
Metric Output Pydantic Model.
A structured output for metric results with score and explanation.
Attributes:
Name | Type | Description |
---|---|---|
score |
MetricValue
|
The evaluation score. Can be continuous (float), discrete (int), or categorical (str). |
explanation |
str
|
A detailed explanation of the evaluation result. |
QAData
Bases: TypedDict
QA data.
A data for QA evaluation such as GenerationEvaluator, etc.
Attributes:
Name | Type | Description |
---|---|---|
query |
str
|
The query. |
expected_response |
str
|
The expected response. |
generated_response |
str
|
The generated response. |
RAGData
Bases: TypedDict
RAG data.
A data for RAG evaluation such as GenerationEvaluator, RetrievalEvaluator, etc.
Attributes:
Name | Type | Description |
---|---|---|
query |
str | None
|
The query. |
expected_response |
str | None
|
The expected response. For multiple responses, use list[str]. |
generated_response |
str | None
|
The generated response. For multiple responses, use list[str]. |
retrieved_context |
str | list[str] | None
|
The retrieved context. If the retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str]. |
expected_retrieved_context |
str | list[str] | None
|
The expected retrieved context. If the expected retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str]. |
rubrics |
dict[str, str] | None
|
Evaluation rubric for the sample. |
RetrievalData
Bases: TypedDict
Retrieval data.
A data for retrieval evaluation such as ClassicalRetrievalEvaluator, etc.
Attributes:
Name | Type | Description |
---|---|---|
retrieved_chunks |
dict[str, float]
|
The retrieved chunks and their scores. |
ground_truth_chunk_ids |
list[str]
|
The ground truth chunk IDs. |
validate_metric_result(parsed_response)
Validate the response.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parsed_response |
dict
|
The response to validate. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
The validated response. |
Raises:
Type | Description |
---|---|
ValueError
|
If the response is not valid. |