Skip to content

Retriever

Module for retriever classes.

BM25Retriever(data_store)

Bases: BaseVectorRetriever

Retrieves documents from the Elasticsearch data store using BM25.

Example
retriever = BM25Retriever(data_store=elasticsearch_data_store)
results = await retriever.retrieve("search query", top_k=10)

Initialize the BM25Retriever.

Parameters:

Name Type Description Default
data_store BaseVectorDataStore

The vector data store with BM25 capabilities.

required

BasicSQLRetriever(sql_data_store, lm_request_processor=None, extract_func=None, max_retries=0, preprocess_query_func=None)

Bases: BaseSQLRetriever

Initializes a new instance of the BasicSQLRetriever class.

This class provides a straightforward implementation of the BasicSQLRetriever, using a single data store for document retrieval.

Attributes:

Name Type Description
sql_data_store BaseSQLDataStore

The SQL database data store to be used.

lm_request_processor LMRequestProcessor | None

The LMRequestProcessor instance to be used for modifying a failed query. If not None, will modify the query upon failure.

extract_func Callable[[str | list[str] | dict[str, str | list[str]]], str | list[str]]

A function to extract the modified query from the LM output.

max_retries int

The maximum number of retries for the retrieval process.

preprocess_query_func Callable[[str], str] | None

A function to preprocess the query before execution.

logger Logger

The logger instance to be used for logging.

Initializes the BasicSQLRetriever object.

Parameters:

Name Type Description Default
sql_data_store BaseSQLDataStore

The SQL database data store to be used.

required
lm_request_processor LMRequestProcessor | None

The LMRequestProcessor instance to be used for modifying a failed query. If not None, will modify the query upon failure.

None
extract_func Callable[[str | list[str] | dict[str, str | list[str]]], str | list[str]]

A function to extract the transformed query from the output. Defaults to None, in which case a default extractor will be used.

None
max_retries int

The maximum number of retries for the retrieval process. Defaults to 0.

0
preprocess_query_func Callable[[str], str] | None

A function to preprocess the query before. Defaults to None.

None

Raises:

Type Description
ValueError

If lm_request_processor is not provided when max_retries is greater than 0.

retrieve(query, event_emitter=None, prompt_kwargs=None, return_query=False) async

Retrieve data based on the query.

This method performs a retrieval operation using the configured data store.

Parameters:

Name Type Description Default
query str

The query string to retrieve documents.

required
event_emitter EventEmitter | None

The event emitter to emit events. Defaults to None.

None
prompt_kwargs dict[str, Any] | None

Additional keyword arguments for the prompt. Defaults to None.

None
return_query bool

If True, returns a tuple of the executed query and the result. If False, returns only the result. Defaults to False.

False

Returns:

Type Description
DataFrame | tuple[str, DataFrame]

pd.DataFrame | tuple[str, pd.DataFrame]: The result of the retrieval process. If return_query is True, returns a tuple of the executed query and the result. If return_query is False, returns only the result.

Raises:

Type Description
ValueError

If the retrieval process fails after the maximum number of retries.

BasicVectorRetriever(data_store)

Bases: BaseVectorRetriever

Initializes a new instance of the BasicVectorRetriever class.

This class provides a straightforward implementation of the BaseRetriever, using a single data store for document retrieval.

Attributes:

Name Type Description
data_store BaseVectorDataStore

The data store used for retrieval operations.

Initializes a new instance of the BasicRetriever class.

Parameters:

Name Type Description Default
data_store BaseVectorDataStore

The data store to be used for retrieval operations.

required

LightRAGRetriever(data_store)

Bases: BaseGraphRAGRetriever

Retriever implementation for LightRAG-based graph RAG.

This class provides a retriever interface for LightRAG, a graph-based Retrieval Augmented Generation system. It handles the conversion between LightRAG's raw output format and structured data objects, allowing for retrieval of document chunks, knowledge graph entities, and relationships based on natural language queries.

The retriever works with LightRAG data stores to perform graph-aware retrieval operations that leverage both semantic similarity and graph structure to find relevant information.

Example
from gllm_datastore.graph_data_store.light_rag_data_store import LightRAGDataStore
from gllm_retrieval.retriever.graph_retriever.light_rag_retriever import LightRAGRetriever
from gllm_retrieval.retriever.graph_retriever.constants import ReturnType

# Create a retriever with an existing data store
retriever = LightRAGRetriever(light_rag_data_store)

# Retrieve as chunks
chunks = await retriever.retrieve("What is machine learning?")

# Retrieve as dictionary with nodes and edges
result_dict = await retriever.retrieve(
    "What is machine learning?",
    return_type=ReturnType.DICT
)

# Retrieve as chunk contents (strings)
chunk_contents = await retriever.retrieve(
    "What is machine learning?",
    return_type=ReturnType.STRINGS
)

# Retrieve raw result from LightRAG
raw_result = await retriever.retrieve(
    "What is machine learning?",
    return_type=ReturnType.STRING
)

# Retrieve as synthesized response
result = await retriever.retrieve(
    "What is machine learning?",
    only_need_context=False
)

Attributes:

Name Type Description
data_store BaseLightRAGDataStore

The LightRAG data store used for retrieval operations.

_logger Logger

Logger instance for this class.

Initialize the LightRAGRetriever.

Parameters:

Name Type Description Default
data_store BaseLightRAGDataStore

The LightRAG data store to use for retrieval operations.

required

retrieve(query, retrieval_params=None, event_emitter=None, return_type=ReturnType.CHUNKS, only_need_context=None) async

Retrieve information from LightRAG based on a natural language query.

This method queries the LightRAG data store and processes the results according to the specified return type. It can return the raw result, structured chunks, strings, or a dictionary containing nodes and edges from the knowledge graph.

Parameters:

Name Type Description Default
query str

The natural language query to retrieve information for.

required
retrieval_params dict[str, Any] | None

Optional dictionary of parameters to pass to the LightRAG query. Defaults to None.

None
event_emitter EventEmitter | None

Optional event emitter for tracking retrieval events. Defaults to None.

None
return_type ReturnType

The type of result to return (CHUNKS, STRINGS, DICT, or RAW). Defaults to ReturnType.CHUNKS.

CHUNKS
only_need_context bool | None

Whether to only return the context (True) or the full result (False). If None, uses the value from retrieval_params or defaults to True. Defaults to None.

None

Returns:

Type Description
str | list[str] | list[Chunk] | dict[str, Any]

str | list[str] | list[Chunk] | dict[str, Any]: Depending on the only_need_context and return_type parameters: - only_need_context=False: Synthesized response string using LM invoker. - only_need_context=True: - ReturnType.CHUNKS (default): List of Chunk objects. - ReturnType.STRINGS: List of content strings from chunks. - ReturnType.DICT: Dictionary representation of LightRAGRetrievalResult. - ReturnType.STRING: Raw result string from LightRAG without postprocessing.

LlamaIndexGraphRAGRetriever(data_store, property_graph_retriever=None, llama_index_llm=None, embed_model=None, vector_store=None, **kwargs)

Bases: BaseGraphRAGRetriever

A retriever class for querying a knowledge graph using the LlamaIndex framework.

Attributes:

Name Type Description
_index PropertyGraphIndex

The property graph index to use.

_graph_store LlamaIndexGraphRAGDataStore | PropertyGraphStore

The graph store to use.

_llm BaseLLM | None

The language model to use.

_embed_model BaseEmbedding | None

The embedding model to use.

_property_graph_retriever PGRetriever

The property graph retriever to use.

_logger Logger

The logger to use.

_default_return_type ReturnType

The default return type for retrieve method.

Initializes the LlamaIndexGraphRAGRetriever with the provided components.

Parameters:

Name Type Description Default
data_store LlamaIndexGraphRAGDataStore | PropertyGraphStore

The graph store to use.

required
property_graph_retriever PGRetriever | None

An existing retriever to use.

None
llama_index_llm BaseLLM | None

The language model to use for text-to-Cypher retrieval.

None
embed_model BaseEmbedding | None

The embedding model to use.

None
vector_store BasePydanticVectorStore | None

The vector store to use.

None
**kwargs Any

Additional keyword arguments. Supported kwargs: - default_return_type (ReturnType): Default return type for retrieve method. Defaults to "chunks".

{}

Raises:

Type Description
ValueError

If an invalid return type is provided.

retrieve(query, retrieval_params=None, event_emitter=None, **kwargs) async

Retrieves relevant documents for a given query.

Parameters:

Name Type Description Default
query str

The query string to search for.

required
retrieval_params dict[str, Any] | None

Additional retrieval parameters.

None
event_emitter EventEmitter | None

Event emitter for logging.

None
**kwargs Any

Additional keyword arguments. Supported kwargs: - return_type (ReturnType): Type of return value ("chunks" or "strings"). Defaults to value set in constructor.

{}

Returns:

Type Description
str | list[str] | list[Chunk] | dict[str, Any]

str | list[str] | list[Chunk] | dict[str, Any]: The result of the retrieval process based on return_type: - If return_type is "chunks": Returns list[Chunk] - If return_type is "strings": Returns list[str] - Returns empty list on error

Raises:

Type Description
ValueError

If an invalid return type is provided.