Vector retriever v2
Defines a basic vector retriever module for document retrieval.
This module provides a simple implementation of a retriever that uses a single BaseDataStore with vector capability for document retrieval operations.
VectorRetriever(data_store)
Bases: DatastoreChunkRetriever
A vector retriever using BaseDataStore with vector capability.
This class provides a straightforward implementation of the BaseRetriever, using a BaseDataStore with vector capability for document retrieval via similarity search.
Examples:
from gllm_datastore.core.filters import filter as F
data_store = ElasticsearchDataStore(...).with_vector(em_invoker=embedding_model)
retriever = VectorRetriever(data_store=data_store)
results = await retriever.retrieve("search query", top_k=10)
# results: [Chunk(content="...", score=0.95), Chunk(content="...", score=0.87), ...]
results = await retriever.retrieve(
"search query",
query_filter=F.eq("metadata.category", "AI"),
top_k=10,
threshold=0.8
)
# results: [Chunk(content="...", score=0.92), Chunk(content="...", score=0.85), ...]
batch_results = await retriever.retrieve(["query 1", "query 2"], top_k=10)
# batch_results: [
# [Chunk(content="...", score=0.95), ...], # results for "query 1"
# [Chunk(content="...", score=0.89), ...] # results for "query 2"
# ]
Attributes:
| Name | Type | Description |
|---|---|---|
data_store |
BaseDataStore
|
The data store with vector capability used for retrieval operations. |
Initialize the VectorRetriever with a data store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data_store
|
BaseDataStore
|
The data store to retrieve from. Must have vector capability registered. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If data_store is not an instance of BaseDataStore. |
ValueError
|
If data_store does not have vector capability registered. |
retrieve(query, query_filter=None, top_k=None, threshold=None, **kwargs)
async
retrieve(query: str, query_filter: FilterClause | QueryFilter | None = None, top_k: int | None = None, threshold: float | None = None, **kwargs: Any) -> list[Chunk]
retrieve(query: list[str], query_filter: FilterClause | QueryFilter | None = None, top_k: int | None = None, threshold: float | None = None, **kwargs: Any) -> list[list[Chunk]]
Retrieve documents based on the query using vector similarity search.
This method performs a retrieval operation using the configured data store's vector capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | list[str]
|
The query string or list of query strings to retrieve documents. If a list is provided, retrieval is performed for each query concurrently. |
required |
query_filter
|
FilterClause | QueryFilter | None
|
Filter criteria for the retrieval. Can be a single FilterClause or a composite QueryFilter. Defaults to None. |
None
|
top_k
|
int | None
|
The maximum number of documents to retrieve. Defaults to None, which uses the data store's default limit. |
None
|
threshold
|
float | None
|
The minimum score threshold for filtering results. Defaults to None, in which case no filtering is applied. |
None
|
**kwargs
|
Any
|
Additional parameters passed to the data store's retrieve method. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk] | list[list[Chunk]]
|
list[Chunk] | list[list[Chunk]]: A list of retrieved documents sorted by similarity score. Returns list[list[Chunk]] if query is a list of strings. |