Skip to content

Langchain openevals

This module contains the LangChainOpenEvalsMetric and LangChainOpenEvalsLLMAsAJudgeMetric.

LangChainOpenEvalsLLMAsAJudgeMetric(name, prompt, model, system=None, credentials=None, config=None, schema=None, feedback_key='score', continuous=False, choices=None, use_reasoning=True, few_shot_examples=None)

Bases: LangChainOpenEvalsMetric

LangChain OpenEvals LLM as a Judge Metric.

A metric that uses LangChain and OpenEvals to evaluate the LLM as a judge.

Available Fields
  • query (str | None, optional): The query / inputs to evaluate.
  • generated_response (str | list[str] | None, optional): The generated response / outputs to evaluate.
  • expected_response (str | list[str] | None, optional): The expected response / reference outputs to evaluate.
  • expected_retrieved_context (str | list[str] | None, optional): The expected retrieved context / reference context.
  • retrieved_context (str | list[str] | None, optional): The list of retrieved contexts.
Scoring
  • 0.0-1.0 (Continuous): An evaluation score assigned by the LLM judge.

Initialize the LangChainOpenEvalsLLMAsAJudgeMetric.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
prompt str

The evaluation prompt, can be a string template, LangChain prompt template, or callable that returns a list of chat messages.

required
model str | ModelId | BaseLMInvoker

The model to use.

required
system str | None

Optional system message to prepend to the prompt.

None
credentials str | None

The credentials to use for the model. Defaults to None.

None
config dict[str, Any] | None

The config to use for the model. Defaults to None.

None
schema ResponseSchema | None

The schema to use for the model. Defaults to None.

None
feedback_key str

Key used to store the evaluation result, defaults to "score".

'score'
continuous bool

If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False.

False
choices list[float] | None

Optional list of specific float values the score must be chosen from. Defaults to None.

None
use_reasoning bool

If True, includes explanation for the score in the output. Defaults to True.

True
few_shot_examples list[FewShotExample] | None

Optional list of example evaluations to append to the prompt. Defaults to None.

None

evaluate(data) async

Evaluate with custom prompt lifecycle support.

Overrides BaseMetric.evaluate() to add custom prompt application and state management. This ensures custom prompts are applied before evaluation and state is restored after.

For batch processing, checks for custom prompts and processes accordingly. Currently processes items individually; batch optimization can be added in future.

Parameters:

Name Type Description Default
data MetricInput | list[MetricInput]

Single data item or list of data items to evaluate.

required

Returns:

Type Description
MetricOutput | list[MetricOutput]

Evaluation results with scores namespaced by metric name.

LangChainOpenEvalsMetric(name, evaluator)

Bases: BaseMetric

LangChain OpenEvals Metric.

A metric that uses LangChain and OpenEvals.

Available Fields
  • query (str | None, optional): The query / inputs to evaluate.
  • generated_response (str | list[str] | None, optional): The generated response / outputs to evaluate.
  • expected_response (str | list[str] | None, optional): The expected response / reference outputs to evaluate.
  • expected_retrieved_context (str | list[str] | None, optional): The expected retrieved context / reference context.
  • retrieved_context (str | list[str] | None, optional): The list of retrieved contexts.
Scoring
  • 0.0-1.0 (Continuous): Depending on the specific OpenEval metric.

Initialize the LangChainOpenEvalsMetric.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
evaluator Union[SimpleAsyncEvaluator, Callable[..., Awaitable[Any]]]

The evaluator to use.

required