Retriever
Defines an abstract base class to create a retriever.
This module provides the BaseRetriever class, which serves as a foundation for implementing retrieval systems in Gen AI applications using the BaseDataStore from gllm-datastore.
BaseRetriever()
Bases: Component, Generic[T]
An abstract base class for retrievers.
This class defines the interface for retriever components, which are responsible for retrieving relevant documents or information based on a given query.
The type parameter T defines the return type for single query retrieval, allowing subclasses to specify their own return types: - list[Chunk] for vector/fulltext retrievers - pd.DataFrame for SQL retrievers - dict[str, Any] for graph retrievers - Any custom type for specialized retrievers
The base class provides orchestration for: - Single vs batch query handling - Concurrent batch processing - Common parameter management
Subclasses must implement the core retrieval logic in _execute_retrieval.
Initialize the BaseRetriever.
Subclasses should override this to accept their specific dependencies (e.g., data_store, api_client, graph_db, etc.).
retrieve(query, query_filter=None, **kwargs)
abstractmethod
async
retrieve(query: str, query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> T
retrieve(query: list[str], query_filter: FilterClause | QueryFilter | None = None, **kwargs: Any) -> list[T]
Retrieve data based on the query.
This method should be implemented by subclasses to define the specific retrieval logic. The return type is determined by the type parameter T.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | list[str]
|
The query string or list of query strings to retrieve data. If a list is provided, retrieval is performed for each query and results are returned as a list of T. |
required |
query_filter
|
FilterClause | QueryFilter | None
|
Filter criteria for the retrieval. Can be a single FilterClause or a composite QueryFilter. Defaults to None. |
None
|
**kwargs
|
Any
|
Additional parameters for the retrieval process. Common parameters include: 1. top_k (int): Maximum number of documents to retrieve. 2. threshold (float): Minimum score threshold for filtering results. 3. timeout (float): Maximum time in seconds to wait for retrieval. |
{}
|
Returns:
| Type | Description |
|---|---|
T | list[T]
|
T | list[T]: Retrieved data. Returns T for single query, list[T] for batch queries. The type depends on subclass implementation: 1. list[Chunk] for vector-based retrievers. 2. pd.DataFrame for SQL-based retrievers. 3. Other types as defined by specific implementations. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If the method is not implemented by subclass. |