Deepeval geval
DeepEval GEval Metric Integration.
References
[1] https://deepeval.com/docs/metrics-llm-evals [2] https://arxiv.org/abs/2303.16634
DeepEvalGEvalMetric(name, evaluation_params, model=DefaultValues.MODEL, criteria=None, evaluation_steps=None, rubric=None, model_credentials=None, model_config=None, threshold=0.5, additional_context=None)
Bases: DeepEvalMetricFactory, PromptExtractionMixin
DeepEval GEval Metric Integration.
This class is a wrapper for the DeepEvalGEval class. It is used to wrap the GEval class and provide a unified interface for the DeepEval library.
GEval is a versatile evaluation metric framework that can be used to create custom evaluation metrics.
Available Fields:
- query (str, optional): The query to evaluate the metric. Similar to input in LLMTestCaseParams.
- generated_response (str | list[str], optional): The generated response to evaluate the metric. Similar to
actual_output in LLMTestCaseParams. If the generated response is a list, the responses are concatenated
into a single string.
- expected_response (str | list[str], optional): The expected response to evaluate the metric. Similar to
expected_output in LLMTestCaseParams. If the expected response is a list, the responses are concatenated
into a single string.
- expected_retrieved_context (str | list[str], optional): The expected retrieved context to evaluate the metric.
Similar to context in LLMTestCaseParams. If the expected retrieved context is a str, it will be converted
into a list with a single element.
- retrieved_context (str | list[str], optional): The list of retrieved contexts to evaluate the metric. Similar to
retrieval_context in LLMTestCaseParams. If the retrieved context is a str, it will be converted into a list
with a single element.
Initializes the DeepEvalGEvalMetric class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name |
str
|
The name of the metric. |
required |
evaluation_params |
list[LLMTestCaseParams]
|
The evaluation parameters. |
required |
model |
str | ModelId | BaseLMInvoker
|
The model to use for the metric. Defaults to "openai/gpt-4.1". |
MODEL
|
criteria |
str | None
|
The criteria to use for the metric. Defaults to None. |
None
|
evaluation_steps |
list[str] | None
|
The evaluation steps to use for the metric. Defaults to None. |
None
|
rubric |
list[Rubric] | None
|
The rubric to use for the metric. Defaults to None. |
None
|
model_credentials |
str | None
|
The model credentials to use for the metric. Defaults to None. Required when model is a string. |
None
|
model_config |
dict[str, Any] | None
|
The model config to use for the metric. Defaults to None. |
None
|
threshold |
float
|
The threshold to use for the metric. Defaults to 0.5. |
0.5
|
additional_context |
str | None
|
Additional context like few-shot examples. Defaults to None. |
None
|
get_full_prompt(data)
Get the full prompt that DeepEval generates for this metric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
MetricInput
|
The metric input. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The complete prompt (system + user) as a string. |
PromptExtractionMixin
Mixin class that provides get_full_prompt functionality for metrics.
This mixin provides a standard interface for metrics that support prompt extraction. Metrics that inherit from this mixin should implement the get_full_prompt method.
get_full_prompt(data)
Get the full prompt that the metric generates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
MetricInput
|
The metric input. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The complete prompt as a string. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the metric doesn't support prompt extraction. |