Skip to content

Langchain agentevals

This module contains the LangChain AgentEvals Metric.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

[1] https://github.com/langchain-ai/agentevals

LangChainAgentEvalsLLMAsAJudgeMetric(name, prompt, model, credentials=None, config=None, schema=None, feedback_key='trajectory_accuracy', continuous=False, choices=None, use_reasoning=True, few_shot_examples=None)

Bases: LangChainAgentEvalsMetric

A metric that uses LangChain AgentEvals to evaluate Agent as a judge.

Available Fields: - agent_trajectory (list[dict[str, Any]]): The agent trajectory. - expected_agent_trajectory (list[dict[str, Any]]): The expected agent trajectory.

Attributes:

Name Type Description
name str

The name of the metric.

evaluator SimpleAsyncEvaluator

The evaluator to use.

Initialize the LangChainAgentEvalsLLMAsAJudgeMetric.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
prompt str

The evaluation prompt, can be a string template, LangChain prompt template, or callable that returns a list of chat messages. Note that the default prompt allows a rubric in addition to the typical "inputs", "outputs", and "reference_outputs" parameters.

required
model str | ModelId | BaseLMInvoker

The model to use.

required
credentials str | None

The credentials to use for the model. Defaults to None.

None
config dict[str, Any] | None

The config to use for the model. Defaults to None.

None
schema ResponseSchema | None

The schema to use for the model. Defaults to None.

None
feedback_key str

Key used to store the evaluation result, defaults to "trajectory_accuracy".

'trajectory_accuracy'
continuous bool

If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False.

False
choices list[float] | None

Optional list of specific float values the score must be chosen from. Defaults to None.

None
use_reasoning bool

If True, includes explanation for the score in the output. Defaults to True.

True
few_shot_examples list[FewShotExample] | None

Optional list of example evaluations to append to the prompt. Defaults to None.

None

LangChainAgentEvalsMetric(name, evaluator)

Bases: BaseMetric

A metric that uses LangChain AgentEvals to evaluate Agent.

Available Fields: - agent_trajectory (list[dict[str, Any]]): The agent trajectory. - expected_agent_trajectory (list[dict[str, Any]]): The expected agent trajectory.

Attributes:

Name Type Description
name str

The name of the metric.

evaluator SimpleAsyncEvaluator

The evaluator to use.

Initialize the LangChainAgentEvalsMetric.

Parameters:

Name Type Description Default
name str

The name of the metric.

required
evaluator SimpleAsyncEvaluator

The evaluator to use.

required