Skip to content

Types

Types for the evaluator.

This module contains the types for the evaluator.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

NONE

EvaluationOutput = Mapping[str, Any] module-attribute

Constant variables

MetricInput = Mapping[str, Any] module-attribute

Evaluation Output Type.

This is the type of the output of the evaluation function.

EvaluationOutput is a mapping of dataset field names to dataset field values.

MetricValue = float | int | str module-attribute

Metric Output and Input Types.

This is the type of the output and input of a metric.

MetricOutput is a mapping of metric names to MetricOutputChild. MetricInput is a mapping of dataset field names to dataset field values.

AgentData

Bases: TypedDict

Agent data.

A data for agent evaluation such as AgentEvals, etc.

Attributes:

Name Type Description
agent_trajectory list[dict[str, Any]]

The agent trajectory.

expected_agent_trajectory list[dict[str, Any]] | None

The expected agent trajectory.

MetricResult

Bases: BaseModel

Metric Output Pydantic Model.

A structured output for metric results with score and explanation.

Attributes:

Name Type Description
score MetricValue

The evaluation score. Can be continuous (float), discrete (int), or categorical (str).

explanation str

A detailed explanation of the evaluation result.

QAData

Bases: TypedDict

QA data.

A data for QA evaluation such as GenerationEvaluator, etc.

Attributes:

Name Type Description
query str

The query.

expected_response str

The expected response.

generated_response str

The generated response.

RAGData

Bases: TypedDict

RAG data.

A data for RAG evaluation such as GenerationEvaluator, RetrievalEvaluator, etc.

Attributes:

Name Type Description
query str | None

The query.

expected_response str | None

The expected response. For multiple responses, use list[str].

generated_response str | None

The generated response. For multiple responses, use list[str].

retrieved_context str | list[str] | None

The retrieved context. If the retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str].

expected_retrieved_context str | list[str] | None

The expected retrieved context. If the expected retrieved context is concatenated from multiple contexts, use str. Otherwise, use list[str].

rubrics dict[str, str] | None

Evaluation rubric for the sample.

RetrievalData

Bases: TypedDict

Retrieval data.

A data for retrieval evaluation such as ClassicalRetrievalEvaluator, etc.

Attributes:

Name Type Description
retrieved_chunks dict[str, float]

The retrieved chunks and their scores.

ground_truth_chunk_ids list[str]

The ground truth chunk IDs.

validate_metric_result(parsed_response)

Validate the response.

Parameters:

Name Type Description Default
parsed_response dict

The response to validate.

required

Returns:

Name Type Description
dict dict

The validated response.

Raises:

Type Description
ValueError

If the response is not valid.