Deepeval geval

DeepEval GEval Metric Integration.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

[1] https://deepeval.com/docs/metrics-llm-evals [2] https://arxiv.org/abs/2303.16634

`DeepEvalGEvalMetric(name, evaluation_params, model=DefaultValues.MODEL, criteria=None, evaluation_steps=None, rubric=None, model_credentials=None, model_config=None, threshold=0.5, additional_context=None)`

Bases: DeepEvalMetricFactory, PromptExtractionMixin

DeepEval GEval Metric Integration.

This class is a wrapper for the DeepEvalGEval class. It is used to wrap the GEval class and provide a unified interface for the DeepEval library.

GEval is a versatile evaluation metric framework that can be used to create custom evaluation metrics.

Available Fields: - query (str, optional): The query to evaluate the metric. Similar to input in LLMTestCaseParams. - generated_response (str | list[str], optional): The generated response to evaluate the metric. Similar to actual_output in LLMTestCaseParams. If the generated response is a list, the responses are concatenated into a single string. - expected_response (str | list[str], optional): The expected response to evaluate the metric. Similar to expected_output in LLMTestCaseParams. If the expected response is a list, the responses are concatenated into a single string. - expected_retrieved_context (str | list[str], optional): The expected retrieved context to evaluate the metric. Similar to context in LLMTestCaseParams. If the expected retrieved context is a str, it will be converted into a list with a single element. - retrieved_context (str | list[str], optional): The list of retrieved contexts to evaluate the metric. Similar to retrieval_context in LLMTestCaseParams. If the retrieved context is a str, it will be converted into a list with a single element.

Initializes the DeepEvalGEvalMetric class.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the metric.	required
`evaluation_params`	`list[LLMTestCaseParams]`	The evaluation parameters.	required
`model`	`str \| ModelId \| BaseLMInvoker`	The model to use for the metric. Defaults to "openai/gpt-4.1".	`MODEL`
`criteria`	`str \| None`	The criteria to use for the metric. Defaults to None.	`None`
`evaluation_steps`	`list[str] \| None`	The evaluation steps to use for the metric. Defaults to None.	`None`
`rubric`	`list[Rubric] \| None`	The rubric to use for the metric. Defaults to None.	`None`
`model_credentials`	`str \| None`	The model credentials to use for the metric. Defaults to None. Required when model is a string.	`None`
`model_config`	`dict[str, Any] \| None`	The model config to use for the metric. Defaults to None.	`None`
`threshold`	`float`	The threshold to use for the metric. Defaults to 0.5.	`0.5`
`additional_context`	`str \| None`	Additional context like few-shot examples. Defaults to None.	`None`

`get_full_prompt(data)`

Get the full prompt that DeepEval generates for this metric.

Parameters:

Name	Type	Description	Default
`data`	`MetricInput`	The metric input.	required

Returns:

Name	Type	Description
`str`	`str`	The complete prompt (system + user) as a string.

`PromptExtractionMixin`

Mixin class that provides get_full_prompt functionality for metrics.

This mixin provides a standard interface for metrics that support prompt extraction. Metrics that inherit from this mixin should implement the get_full_prompt method.

`get_full_prompt(data)`

Get the full prompt that the metric generates.

Parameters:

Name	Type	Description	Default
`data`	`MetricInput`	The metric input.	required

Returns:

Name	Type	Description
`str`	`str`	The complete prompt as a string.

Raises:

Type	Description
`NotImplementedError`	If the metric doesn't support prompt extraction.