Langchain openevals
This module contains the LangChainOpenEvalsMetric and LangChainOpenEvalsLLMAsAJudgeMetric.
LangChainOpenEvalsLLMAsAJudgeMetric(name, prompt, model, system=None, credentials=None, config=None, schema=None, feedback_key='score', continuous=False, choices=None, use_reasoning=True, few_shot_examples=None)
Bases: LangChainOpenEvalsMetric
LangChain OpenEvals LLM as a Judge Metric.
A metric that uses LangChain and OpenEvals to evaluate the LLM as a judge.
Available Fields
- query (str | None, optional): The query / inputs to evaluate.
- generated_response (str | list[str] | None, optional): The generated response / outputs to evaluate.
- expected_response (str | list[str] | None, optional): The expected response / reference outputs to evaluate.
- expected_retrieved_context (str | list[str] | None, optional): The expected retrieved context / reference context.
- retrieved_context (str | list[str] | None, optional): The list of retrieved contexts.
Scoring
- 0.0-1.0 (Continuous): An evaluation score assigned by the LLM judge.
Initialize the LangChainOpenEvalsLLMAsAJudgeMetric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
prompt
|
str
|
The evaluation prompt, can be a string template, LangChain prompt template, or callable that returns a list of chat messages. |
required |
model
|
str | ModelId | BaseLMInvoker
|
The model to use. |
required |
system
|
str | None
|
Optional system message to prepend to the prompt. |
None
|
credentials
|
str | None
|
The credentials to use for the model. Defaults to None. |
None
|
config
|
dict[str, Any] | None
|
The config to use for the model. Defaults to None. |
None
|
schema
|
ResponseSchema | None
|
The schema to use for the model. Defaults to None. |
None
|
feedback_key
|
str
|
Key used to store the evaluation result, defaults to "score". |
'score'
|
continuous
|
bool
|
If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False. |
False
|
choices
|
list[float] | None
|
Optional list of specific float values the score must be chosen from. Defaults to None. |
None
|
use_reasoning
|
bool
|
If True, includes explanation for the score in the output. Defaults to True. |
True
|
few_shot_examples
|
list[FewShotExample] | None
|
Optional list of example evaluations to append to the prompt. Defaults to None. |
None
|
evaluate(data)
async
Evaluate with custom prompt lifecycle support.
Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.
For batch processing, checks for custom prompts and processes accordingly. Currently processes items individually; batch optimization can be added in future.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
MetricInput | list[MetricInput]
|
Single data item or list of data items to evaluate. |
required |
Returns:
| Type | Description |
|---|---|
MetricOutput | list[MetricOutput]
|
Evaluation results with scores namespaced by metric name. |
LangChainOpenEvalsMetric(name, evaluator)
Bases: BaseMetric
LangChain OpenEvals Metric.
A metric that uses LangChain and OpenEvals.
Available Fields
- query (str | None, optional): The query / inputs to evaluate.
- generated_response (str | list[str] | None, optional): The generated response / outputs to evaluate.
- expected_response (str | list[str] | None, optional): The expected response / reference outputs to evaluate.
- expected_retrieved_context (str | list[str] | None, optional): The expected retrieved context / reference context.
- retrieved_context (str | list[str] | None, optional): The list of retrieved contexts.
Scoring
- 0.0-1.0 (Continuous): Depending on the specific OpenEval metric.
Initialize the LangChainOpenEvalsMetric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
The name of the metric. |
required |
evaluator
|
Union[SimpleAsyncEvaluator, Callable[..., Awaitable[Any]]]
|
The evaluator to use. |
required |