Skip to content

States

A collection of preset states for the pipeline.

This module provides both TypedDict and Pydantic BaseModel implementations for pipeline states. TypedDict provides basic type hints while Pydantic BaseModel offers runtime validation, default values, and enhanced type safety.

Authors

Dimitrij Ray (dimitrij.ray@gdplabs.id) Daniel Adi (daniel.adi@gdplabs.id)

References

NONE

RAGState

Bases: TypedDict

A TypedDict representing the state of a Retrieval-Augmented Generation (RAG) pipeline.

This docstring documents the original intention of each of the attributes in the TypedDict. However, in practice, the attributes may be modified or extended to suit the specific requirements of the application. The TypedDict is used to enforce the structure of the state object.

Attributes:

Name Type Description
user_query str

The original query from the user.

queries list[str]

A list of queries generated for retrieval.

retrieval_params dict[str, Any]

Parameters used for the retrieval process.

chunks list

A list of chunks retrieved from the knowledge base.

history list[Any]

The history of the conversation or interaction.

context str

The context information used for generating responses.

response_synthesis_bundle dict[str, Any]

Data used for synthesizing the final response.

response str

The generated response to the user's query.

references str | list[str]

References or sources used in generating the response.

event_emitter EventEmitter

An event emitter instance for logging purposes.

Example
state = {
    "user_query": "What is machine learning?",
    "queries": ["machine learning definition", "ML basics"],
    "retrieval_params": {"top_k": 5, "threshold": 0.8},
    "chunks": [
        {"content": "Machine learning is...", "score": 0.95},
        {"content": "ML algorithms include...", "score": 0.87}
    ],
    "history": [
        {"role": "user", "contents": ["What is machine learning?"]},
        {"role": "assistant", "contents": ["Machine learning is a subset of artificial intelligence..."]}
    ],
    "context": "Retrieved information about ML",
    "response_synthesis_bundle": {"template": "informative"},
    "response": "Machine learning is a subset of artificial intelligence...",
    "references": ["source1.pdf", "article2.html"],
    "event_emitter": EventEmitter()
}

RAGStateModel

Bases: BaseModel

A Pydantic BaseModel representing the state of a Retrieval-Augmented Generation (RAG) pipeline.

This implementation provides runtime validation, default values, and enhanced type safety compared to the TypedDict version. It maintains compatibility with LangGraph while offering improved developer experience through automatic validation and sensible defaults.

Attributes:

Name Type Description
user_query str

The original query from the user.

queries list[str]

A list of queries generated for retrieval. Defaults to empty list.

retrieval_params dict[str, Any]

Parameters used for the retrieval process. Defaults to empty dict.

chunks list[Any]

A list of chunks retrieved from the knowledge base. Defaults to empty list.

history list[Any]

The history of the conversation or interaction. Defaults to empty list.

context str

The context information used for generating responses. Defaults to empty string.

response_synthesis_bundle dict[str, Any]

Data used for synthesizing the final response. Defaults to empty dict.

response str

The generated response to the user's query. Defaults to empty string.

references str | list[str]

References or sources used in generating the response. Defaults to empty list.

event_emitter EventEmitter | None

An event emitter instance for logging purposes. Defaults to None.

Example
# Basic usage with minimal required fields
state = RAGStateModel(user_query="What is machine learning?")

# Full usage with all fields
state = RAGStateModel(
    user_query="What is machine learning?",
    queries=["machine learning definition", "ML basics"],
    retrieval_params={"top_k": 5, "threshold": 0.8},
    chunks=[
        {"content": "Machine learning is...", "score": 0.95},
        {"content": "ML algorithms include...", "score": 0.87}
    ],
    history=[
        {"role": "user", "contents": ["What is machine learning?"]},
        {"role": "assistant", "contents": ["Machine learning is a subset of artificial intelligence..."]}
    ],
    context="Retrieved information about ML",
    response_synthesis_bundle={"template": "informative"},
    response="Machine learning is a subset of artificial intelligence...",
    references=["source1.pdf", "article2.html"],
    event_emitter=EventEmitter()
)

# Convert to dictionary for pipeline processing
state_dict = state.model_dump()

# Use with json_encoders for special types
state_json = state.model_dump_json()
Example with custom JSON encoders
from datetime import datetime
from pydantic import BaseModel, Field

class CustomStateModel(BaseModel):
    timestamp: datetime = Field(default_factory=datetime.now)

    class Config:
        json_encoders = {
            datetime: lambda v: v.isoformat(),
            EventEmitter: lambda v: str(v) if v else None
        }

Config

Pydantic configuration.