Skip to content

Vector

ChromaDB implementation of vector search and CRUD capability.

This module provides a ChromaDB implementation of the VectorCapability protocol using langchain-chroma integration.

Authors

Kadek Denaya (kadek.d.r.diana@gdplabs.id)

References

NONE

ChromaVectorCapability(collection_name, em_invoker, client, num_candidates=DEFAULT_NUM_CANDIDATES)

ChromaDB implementation of VectorCapability protocol.

This class provides document CRUD operations and vector search using ChromaDB.

Attributes:

Name Type Description
collection_name str

The name of the ChromaDB collection.

collection Collection

The ChromaDB collection instance.

vector_store Chroma

The langchain Chroma vector store instance.

num_candidates int

The maximum number of candidates to consider during search.

Initialize the ChromaDB vector capability.

Parameters:

Name Type Description Default
collection_name str

The name of the ChromaDB collection.

required
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
client ClientAPI

The ChromaDB client instance.

required
num_candidates int

Maximum number of candidates to consider during search. Defaults to 50.

DEFAULT_NUM_CANDIDATES

em_invoker property

Returns the EM Invoker instance.

Returns:

Name Type Description
BaseEMInvoker BaseEMInvoker

The EM Invoker instance.

clear() async

Clear all records from the datastore.

create(data, **kwargs) async

Add chunks to the vector store with automatic embedding generation.

Parameters:

Name Type Description Default
data Chunk | list[Chunk]

Single chunk or list of chunks to add.

required
**kwargs Any

Backend-specific parameters.

{}

create_from_vector(chunk_vectors, **kwargs) async

Add pre-computed embeddings directly.

Examples:

await datastore.vector.create_from_vector(chunk_vectors=[
    (Chunk(content="text1", metadata={"source": "source1"}, id="id1"), [0.1, 0.2, 0.3]),
    (Chunk(content="text2", metadata={"source": "source2"}, id="id2"), [0.4, 0.5, 0.6]),
])

Parameters:

Name Type Description Default
chunk_vectors list[tuple[Chunk, Vector]]

List of tuples containing chunks and their corresponding vectors.

required
**kwargs Any

Datastore-specific parameters.

{}

delete(filters=None, **kwargs) async

Delete records from the datastore.

Examples:

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await datastore.vector.delete(filters=F.eq("metadata.category", "tech"))

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.delete(filters=filters)

This will delete all chunks from the vector store that match the filters.

Parameters:

Name Type Description Default
filters FilterClause | QueryFilter | None

Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op).

None
**kwargs Any

Datastore-specific parameters.

{}

ensure_index() async

Ensure ChromaDB collection exists, creating it if necessary.

This method is idempotent - if the collection already exists, it will return the existing collection. The collection is automatically created during initialization, but this method can be called explicitly to ensure it exists.

Raises:

Type Description
RuntimeError

If collection creation fails.

retrieve(query, filters=None, options=None) async

Semantic search using text query converted to vector.

Examples:

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await datastore.vector.retrieve(
    query="What is the capital of France?",
    filters=F.eq("metadata.category", "tech")
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.retrieve(query="What is the capital of France?", filters=filters)

This will retrieve the top 10 chunks by similarity score from the vector store that match the query and the filters. The chunks will be sorted by score in descending order.

Parameters:

Name Type Description Default
query str

Text query to embed and search for.

required
filters FilterClause | QueryFilter | None

Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Options to apply to the search. Defaults to None.

None

Returns:

Type Description
list[Chunk]

list[Chunk]: List of chunks ordered by relevance score.

retrieve_by_vector(vector, filters=None, options=None) async

Direct vector similarity search.

Examples:

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await datastore.vector.retrieve_by_vector(
    vector=[0.1, 0.2, 0.3],
    filters=F.eq("metadata.category", "tech")
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.retrieve_by_vector(vector=[0.1, 0.2, 0.3], filters=filters)

This will retrieve the top 10 chunks by similarity score from the vector store that match the vector and the filters. The chunks will be sorted by score in descending order.

Parameters:

Name Type Description Default
vector Vector

Query embedding vector.

required
filters FilterClause | QueryFilter | None

Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Options to apply to the search. Defaults to None.

None

Returns:

Type Description
list[Chunk]

list[Chunk]: List of chunks ordered by similarity score.

update(update_values, filters=None) async

Update existing records in the datastore.

Examples:

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await datastore.vector.update(
    update_values={"metadata": {"status": "published"}},
    filters=F.eq("metadata.category", "tech"),
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.update(
    update_values={"metadata": {"status": "published"}},
    filters=filters,
)

This will update the metadata of the chunks that match the filters to "published".

Parameters:

Name Type Description Default
update_values dict[str, Any]

Values to update.

required
filters FilterClause | QueryFilter | None

Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op).

None
Note

ChromaDB doesn't support direct update operations. This method requires filters to identify records and will update matching records.