Skip to content

Vector retriever v2

Defines a basic vector retriever module for document retrieval.

This module provides a simple implementation of a retriever that uses a single BaseDataStore with vector capability for document retrieval operations.

VectorRetriever(data_store)

Bases: DatastoreChunkRetriever

A vector retriever using BaseDataStore with vector capability.

This class provides a straightforward implementation of the BaseRetriever, using a BaseDataStore with vector capability for document retrieval via similarity search.

Examples:

from gllm_datastore.core.filters import filter as F

data_store = ElasticsearchDataStore(...).with_vector(em_invoker=embedding_model)
retriever = VectorRetriever(data_store=data_store)

results = await retriever.retrieve("search query", top_k=10)
# results: [Chunk(content="...", score=0.95), Chunk(content="...", score=0.87), ...]

results = await retriever.retrieve(
    "search query",
    query_filter=F.eq("metadata.category", "AI"),
    top_k=10,
    threshold=0.8
)
# results: [Chunk(content="...", score=0.92), Chunk(content="...", score=0.85), ...]


batch_results = await retriever.retrieve(["query 1", "query 2"], top_k=10)
# batch_results: [
#     [Chunk(content="...", score=0.95), ...],  # results for "query 1"
#     [Chunk(content="...", score=0.89), ...]   # results for "query 2"
# ]

Attributes:

Name Type Description
data_store BaseDataStore

The data store with vector capability used for retrieval operations.

Initialize the VectorRetriever with a data store.

Parameters:

Name Type Description Default
data_store BaseDataStore

The data store to retrieve from. Must have vector capability registered.

required

Raises:

Type Description
TypeError

If data_store is not an instance of BaseDataStore.

ValueError

If data_store does not have vector capability registered.

retrieve(query, query_filter=None, top_k=None, threshold=None, **kwargs) async

retrieve(query: str, query_filter: FilterClause | QueryFilter | None = None, top_k: int | None = None, threshold: float | None = None, **kwargs: Any) -> list[Chunk]
retrieve(query: list[str], query_filter: FilterClause | QueryFilter | None = None, top_k: int | None = None, threshold: float | None = None, **kwargs: Any) -> list[list[Chunk]]

Retrieve documents based on the query using vector similarity search.

This method performs a retrieval operation using the configured data store's vector capability.

Parameters:

Name Type Description Default
query str | list[str]

The query string or list of query strings to retrieve documents. If a list is provided, retrieval is performed for each query concurrently.

required
query_filter FilterClause | QueryFilter | None

Filter criteria for the retrieval. Can be a single FilterClause or a composite QueryFilter. Defaults to None.

None
top_k int | None

The maximum number of documents to retrieve. Defaults to None, which uses the data store's default limit.

None
threshold float | None

The minimum score threshold for filtering results. Defaults to None, in which case no filtering is applied.

None
**kwargs Any

Additional parameters passed to the data store's retrieve method.

{}

Returns:

Type Description
list[Chunk] | list[list[Chunk]]

list[Chunk] | list[list[Chunk]]: A list of retrieved documents sorted by similarity score. Returns list[list[Chunk]] if query is a list of strings.