Skip to content

Vector retriever

Defines an abstract base class to create a retriever.

This module provides the BaseRetriever class, which serves as a foundation for implementing retrieval systems in Gen AI applications.

Authors

Henry Wicaksono (henry.wicaksono@gdplabs.id) Resti Febriana (resti.febriana@gdplabs.id)

References

[1] https://python.langchain.com/docs/modules/data_connection/retrievers/

BaseVectorRetriever(data_store)

Bases: Component, ABC

An abstract base class for the retriever used in Gen AI applications.

This class defines the interface for retriever components, which are responsible for retrieving relevant documents or information based on a given query.

Attributes:

Name Type Description
data_store BaseVectorDataStore | list[BaseVectorDataStore]

The data store or list of data stores to be used.

Initializes the BaseRetriever object.

Parameters:

Name Type Description Default
data_store BaseVectorDataStore | list[BaseVectorDataStore]

The data store or list of data stores to be used.

required

retrieve(query, top_k=DEFAULT_TOP_K, retrieval_params=None, event_emitter=None, timeout=None, threshold=None) async

Retrieve documents based on the query.

This method performs the retrieval process by calling the _retrieve method. If the retrieval process fails or times out, it will return an empty list.

Parameters:

Name Type Description Default
query str

The query string to retrieve documents.

required
top_k int

The maximum number of documents to retrieve. Defaults to DEFAULT_TOP_K.

DEFAULT_TOP_K
retrieval_params dict[str, Any] | None

Additional parameters for the retrieval process. These could include 'max_results', 'sort_order', etc., specific to the retrieval logic. Defaults to None.

None
event_emitter EventEmitter | None

The event emitter to emit events. Defaults to None.

None
timeout float | int | None

Maximum time in seconds to wait for retrieval to complete. If None, no timeout is applied. Defaults to None.

None
threshold float | None

The minimum score threshold for filtering results.

None

Returns:

Type Description
list[Chunk]

list[Chunk]: A list of retrieved documents.