Lm based retrieval evaluator
LM-based retrieval evaluator.
This evaluator focuses on retrieval quality for RAG-style pipelines using
- DeepEval contextual precision
- DeepEval contextual recall
It applies a simple rule-based combiner over precision and recall to derive an overall retrieval rating and a global explanation.
References
NONE
FloatMetricSpec(metric, comparator, threshold)
Bases: Specification
Atomic specification for float-based metrics such as precision >= 0.7.
Attributes:
| Name | Type | Description |
|---|---|---|
metric |
str
|
The metric to evaluate. |
threshold |
float
|
The threshold for the metric. |
Initialize the FloatMetricSpec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metric
|
str
|
The metric to evaluate. |
required |
comparator
|
str
|
The comparator to use. |
required |
threshold
|
float
|
The threshold for the metric. |
required |
__repr__()
Get the representation of the FloatMetricSpec.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The representation of the FloatMetricSpec. |
is_satisfied_by(candidate)
Check if the candidate satisfies the metric specification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
candidate
|
Mapping[str, float]
|
The candidate to check. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if the candidate satisfies the metric specification, False otherwise. |
LMBasedRetrievalEvaluator(metrics=None, enabled_metrics=None, model=DefaultValues.MODEL, model_credentials=None, model_config=None, run_parallel=True, rule_book=None, rule_engine=None)
Bases: BaseEvaluator
Evaluator for LM-based retrieval quality in RAG pipelines.
This evaluator
- Runs a configurable set of retrieval metrics (by default: DeepEval contextual precision and contextual recall)
- Combines their scores using a simple rule-based scheme to produce:
relevancy_rating(good / bad / incomplete)score(aggregated retrieval score)possible_issues(list of textual issues)
Default expected input
- query (str): The query to evaluate the metric.
- expected_response (str): The expected response to evaluate the metric.
- retrieved_context (str | list[str]): The list of retrieved contexts to evaluate the metric. If the retrieved context is a str, it will be converted into a list with a single element.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the evaluator. |
metrics |
list[BaseMetric]
|
The list of metrics to evaluate. |
enabled_metrics |
Sequence[type[BaseMetric] | str] | None
|
The list of metrics to enable. |
model |
str | ModelId | BaseLMInvoker
|
The model to use for the metrics. |
model_credentials |
str | None
|
The model credentials to use for the metrics. |
model_config |
dict[str, Any] | None
|
The model configuration to use for the metrics. |
run_parallel |
bool
|
Whether to run the metrics in parallel. |
rule_book |
LMBasedRetrievalRuleBook | None
|
The rule book for evaluation. |
rule_engine |
LMBasedRetrievalRuleEngine | None
|
The rule engine for classification. |
Initialize the LM-based retrieval evaluator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
metrics
|
Sequence[BaseMetric] | None
|
Optional custom retrieval metric instances. If provided, these will be used as the base pool and may override the default metrics by name. Defaults to None. |
None
|
enabled_metrics
|
Sequence[type[BaseMetric] | str] | None
|
Optional subset of metrics to enable
from the metric pool. Each entry can be either a metric class or its |
None
|
model
|
str | ModelId | BaseLMInvoker
|
Model for the default DeepEval metrics. Defaults to
|
MODEL
|
model_credentials
|
str | None
|
Credentials for the model, required when |
None
|
model_config
|
dict[str, Any] | None
|
Optional model configuration. Defaults to None. |
None
|
run_parallel
|
bool
|
Whether to run retrieval metrics in parallel. Defaults to True. |
True
|
rule_book
|
LMBasedRetrievalRuleBook | None
|
The rule book for evaluation. If not provided, a default one is generated based on enabled metrics. Defaults to None. |
None
|
rule_engine
|
LMBasedRetrievalRuleEngine | None
|
The rule engine for classification. If not provided, a new instance is created with the determined rule book. Defaults to None. |
None
|
required_fields
property
Return the union of required fields from all configured metrics.
LMBasedRetrievalRuleBook(good, bad)
dataclass
Bundle of good and bad composite specs for LM-based retrieval.
Both are plain FloatMetricSpec objects, so you can build them with & / | / ~
or any other sutoppu combinator.
Attributes:
| Name | Type | Description |
|---|---|---|
good |
FloatMetricSpec
|
The good rule. |
bad |
FloatMetricSpec
|
The bad rule. |
LMBasedRetrievalRuleEngine(rules)
Bases: BaseRuleEngine[LMBasedRetrievalRuleBook, FloatMetricSpec, str]
Classify metric dictionaries using an injected LMBasedRetrievalRuleBook.
Initialize the LMBasedRetrievalRuleEngine.
RuleConfig(defaults, score_key, comparator, empty_fallback, combine_with_and)
dataclass
Configuration for building a composite rule.
from_default_rule_book(enabled_metrics=None, metric_thresholds=None)
staticmethod
Get the default rule engine.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled_metrics
|
list[str] | None
|
A list of metric names to include. If None, all are included. Defaults to None. |
None
|
metric_thresholds
|
dict[str, dict[str, float]] | None
|
Optional per-metric threshold overrides. Keys are metric names and values are dictionaries with optional "good_score" and "bad_score" entries. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LMBasedRetrievalRuleEngine |
'LMBasedRetrievalRuleEngine'
|
The default rule engine. |
get_adaptive_rule_book(available_metrics, metric_thresholds=None)
staticmethod
Generate rule book based on available metrics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
available_metrics
|
list[str]
|
The list of available metrics. |
required |
metric_thresholds
|
dict[str, dict[str, float]] | None
|
Optional per-metric threshold overrides. Keys are metric names and values are dictionaries with optional "good_score" and "bad_score" entries. Defaults to None. Note: This parameter is deprecated. Prefer creating a custom rule_book directly. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LMBasedRetrievalRuleBook |
LMBasedRetrievalRuleBook
|
The adaptive rule book. |
get_default_bad_rule(enabled_metrics=None, metric_thresholds=None)
staticmethod
Get the bad rule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled_metrics
|
list[str] | None
|
A list of metric names to include. If None, all are included. Defaults to None. |
None
|
metric_thresholds
|
dict[str, dict[str, float]] | None
|
Optional per-metric threshold overrides. Keys are metric names and values are dictionaries with optional "bad_score" entries. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
FloatMetricSpec |
FloatMetricSpec
|
The bad rule. |
get_default_good_rule(enabled_metrics=None, metric_thresholds=None)
staticmethod
Get the good rule.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled_metrics
|
list[str] | None
|
A list of metric names to include. If None, all are included. Defaults to None. |
None
|
metric_thresholds
|
dict[str, dict[str, float]] | None
|
Optional per-metric threshold overrides. Keys are metric names and values are dictionaries with optional "good_score" entries. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
FloatMetricSpec |
FloatMetricSpec
|
The good rule. |
get_default_rule_book(enabled_metrics=None, metric_thresholds=None)
staticmethod
Get the default rule book.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled_metrics
|
list[str] | None
|
A list of metric names to include. If None, all are included. Defaults to None. |
None
|
metric_thresholds
|
dict[str, dict[str, float]] | None
|
Optional per-metric threshold overrides. Keys are metric names and values are dictionaries with optional "good_score" and "bad_score" entries. Defaults to None. Note: This parameter is deprecated. Prefer creating a custom rule_book directly. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
LMBasedRetrievalRuleBook |
LMBasedRetrievalRuleBook
|
The default rule book. |