Langchain agentevals

This module contains the LangChain AgentEvals Metric.

Authors

Surya Mahadi (made.r.s.mahadi@gdplabs.id)

References

[1] https://github.com/langchain-ai/agentevals

`LangChainAgentEvalsLLMAsAJudgeMetric(name, prompt, model, credentials=None, config=None, schema=None, feedback_key='trajectory_accuracy', continuous=False, choices=None, use_reasoning=True, few_shot_examples=None, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS)`

Bases: LangChainAgentEvalsMetric

A metric that uses LangChain AgentEvals to evaluate Agent as a judge.

Available Fields: - agent_trajectory (list[dict[str, Any]]): The agent trajectory. - expected_agent_trajectory (list[dict[str, Any]]): The expected agent trajectory.

Attributes:

Name	Type	Description
`name`	`str`	The name of the metric.
`evaluator`	`SimpleAsyncEvaluator`	The evaluator to use.

Initialize the LangChainAgentEvalsLLMAsAJudgeMetric.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the metric.	required
`prompt`	`str`	The evaluation prompt, can be a string template, LangChain prompt template, or callable that returns a list of chat messages. Note that the default prompt allows a rubric in addition to the typical "inputs", "outputs", and "reference_outputs" parameters.	required
`model`	`str \| ModelId \| BaseLMInvoker`	The model to use.	required
`credentials`	`str \| None`	The credentials to use for the model. Defaults to None.	`None`
`config`	`dict[str, Any] \| None`	The config to use for the model. Defaults to None.	`None`
`schema`	`ResponseSchema \| None`	The schema to use for the model. Defaults to None.	`None`
`feedback_key`	`str`	Key used to store the evaluation result, defaults to "trajectory_accuracy".	`'trajectory_accuracy'`
`continuous`	`bool`	If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False.	`False`
`choices`	`list[float] \| None`	Optional list of specific float values the score must be chosen from. Defaults to None.	`None`
`use_reasoning`	`bool`	If True, includes explanation for the score in the output. Defaults to True.	`True`
`few_shot_examples`	`list[FewShotExample] \| None`	Optional list of example evaluations to append to the prompt. Defaults to None.	`None`
`batch_status_check_interval`	`float`	Interval in seconds between batch status checks. Defaults to 30.0.	`BATCH_STATUS_CHECK_INTERVAL`
`batch_max_iterations`	`int`	Maximum number of batch status check iterations. Defaults to 120.	`BATCH_MAX_ITERATIONS`

`LangChainAgentEvalsMetric(name, evaluator)`

Bases: BaseMetric

A metric that uses LangChain AgentEvals to evaluate Agent.

Available Fields: - agent_trajectory (list[dict[str, Any]]): The agent trajectory. - expected_agent_trajectory (list[dict[str, Any]]): The expected agent trajectory.

Attributes:

Name	Type	Description
`name`	`str`	The name of the metric.
`evaluator`	`SimpleAsyncEvaluator`	The evaluator to use.

Initialize the LangChainAgentEvalsMetric.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the metric.	required
`evaluator`	`SimpleAsyncEvaluator`	The evaluator to use.	required