Langchain agent trajectory accuracy
This module contains the LangChain Agent Trajectory Accuracy Metric.
References
[1] https://github.com/langchain-ai/agentevals?tab=readme-ov-file#trajectory-llm-as-judge
LangChainAgentTrajectoryAccuracyMetric(model, prompt=None, model_credentials=None, model_config=None, schema=None, feedback_key='trajectory_accuracy', continuous=False, choices=None, use_reasoning=True, few_shot_examples=None, use_reference=True, batch_status_check_interval=DefaultValues.BATCH_STATUS_CHECK_INTERVAL, batch_max_iterations=DefaultValues.BATCH_MAX_ITERATIONS)
Bases: LangChainAgentEvalsLLMAsAJudgeMetric
A metric that uses LangChain AgentEvals to evaluate the trajectory accuracy of the agent.
Available Fields: - agent_trajectory (list[dict[str, Any]]): The agent trajectory. - expected_agent_trajectory (list[dict[str, Any]] | None, optional): The expected agent trajectory.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
The name of the metric. |
evaluator |
SimpleAsyncEvaluator
|
The evaluator to use. |
Initialize the LangChainAgentTrajectoryAccuracyMetric.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
str | ModelId | BaseLMInvoker
|
The model to use. |
required |
prompt
|
str | None
|
The prompt to use. Defaults to None. |
None
|
model_credentials
|
str | None
|
The credentials to use for the model. Defaults to None. |
None
|
model_config
|
dict[str, Any] | None
|
The config to use for the model. Defaults to None. |
None
|
schema
|
ResponseSchema | None
|
The schema to use for the model. Defaults to None. |
None
|
feedback_key
|
str
|
Key used to store the evaluation result, defaults to "trajectory_accuracy". |
'trajectory_accuracy'
|
continuous
|
bool
|
If True, score will be a float between 0 and 1. If False, score will be boolean. Defaults to False. |
False
|
choices
|
list[float] | None
|
Optional list of specific float values the score must be chosen from. Defaults to None. |
None
|
use_reasoning
|
bool
|
If True, includes explanation for the score in the output. Defaults to True. |
True
|
few_shot_examples
|
list[FewShotExample] | None
|
Optional list of example evaluations to append to the prompt. Defaults to None. |
None
|
use_reference
|
bool
|
If True, uses the expected agent trajectory to evaluate the trajectory accuracy. Defaults to True. If False, the TRAJECTORY_ACCURACY_CUSTOM_PROMPT is used to evaluate the trajectory accuracy. If custom prompt is provided, this parameter will be ignored. |
True
|
batch_status_check_interval
|
float
|
Interval in seconds between batch status checks. Defaults to 30.0. |
BATCH_STATUS_CHECK_INTERVAL
|
batch_max_iterations
|
int
|
Maximum number of batch status check iterations. Defaults to 120. |
BATCH_MAX_ITERATIONS
|