Skip to content

Deepeval geval

DeepEval GEval Metric Integration.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

[1] https://deepeval.com/docs/metrics-llm-evals [2] https://arxiv.org/abs/2303.16634

DeepEvalGEvalMetric(name, evaluation_params, model=DefaultValues.MODEL, criteria=None, evaluation_steps=None, rubric=None, model_credentials=None, model_config=None, threshold=0.5, additional_context=None)

Bases: DeepEvalMetricFactory, PromptExtractionMixin

DeepEval GEval Metric Integration.

This class is a wrapper for the DeepEvalGEval class. It is used to wrap the GEval class and provide a unified interface for the DeepEval library.

GEval is a versatile evaluation metric framework that can be used to create custom evaluation metrics.

Available Fields: - query (str, optional): The query to evaluate the metric. Similar to input in LLMTestCaseParams. - generated_response (str | list[str], optional): The generated response to evaluate the metric. Similar to actual_output in LLMTestCaseParams. If the generated response is a list, the responses are concatenated into a single string. - expected_response (str | list[str], optional): The expected response to evaluate the metric. Similar to expected_output in LLMTestCaseParams. If the expected response is a list, the responses are concatenated into a single string. - expected_retrieved_context (str | list[str], optional): The expected retrieved context to evaluate the metric. Similar to context in LLMTestCaseParams. If the expected retrieved context is a str, it will be converted into a list with a single element. - retrieved_context (str | list[str], optional): The list of retrieved contexts to evaluate the metric. Similar to retrieval_context in LLMTestCaseParams. If the retrieved context is a str, it will be converted into a list with a single element.

Initializes the DeepEvalGEvalMetric class.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
evaluation_params list[LLMTestCaseParams]

The evaluation parameters.

required
model str | ModelId | BaseLMInvoker

The model to use for the metric. Defaults to "openai/gpt-4.1".

MODEL
criteria str | None

The criteria to use for the metric. Defaults to None.

None
evaluation_steps list[str] | None

The evaluation steps to use for the metric. Defaults to None.

None
rubric list[Rubric] | None

The rubric to use for the metric. Defaults to None.

None
model_credentials str | None

The model credentials to use for the metric. Defaults to None. Required when model is a string.

None
model_config dict[str, Any] | None

The model config to use for the metric. Defaults to None.

None
threshold float

The threshold to use for the metric. Defaults to 0.5.

0.5
additional_context str | None

Additional context like few-shot examples. Defaults to None.

None

get_full_prompt(data)

Get the full prompt that DeepEval generates for this metric.

Parameters:

Name Type Description Default
data MetricInput

The metric input.

required

Returns:

Name Type Description
str str

The complete prompt (system + user) as a string.

PromptExtractionMixin

Mixin class that provides get_full_prompt functionality for metrics.

This mixin provides a standard interface for metrics that support prompt extraction. Metrics that inherit from this mixin should implement the get_full_prompt method.

get_full_prompt(data)

Get the full prompt that the metric generates.

Parameters:

Name Type Description Default
data MetricInput

The metric input.

required

Returns:

Name Type Description
str str

The complete prompt as a string.

Raises:

Type Description
NotImplementedError

If the metric doesn't support prompt extraction.