Vector

ChromaDB implementation of vector search and CRUD capability.

This module provides a ChromaDB implementation of the VectorCapability protocol using langchain-chroma integration.

`ChromaVectorCapability(collection_name, em_invoker, client, num_candidates=DEFAULT_NUM_CANDIDATES, encryption=None)`

ChromaDB implementation of VectorCapability protocol.

This class provides document CRUD operations and vector search using ChromaDB.

Attributes:

Name	Type	Description
`collection_name`	`str`	The name of the ChromaDB collection.
`collection`	`Collection`	The ChromaDB collection instance.
`vector_store`	`Chroma`	The langchain Chroma vector store instance.
`num_candidates`	`int`	The maximum number of candidates to consider during search.

Initialize the ChromaDB vector capability.

Parameters:

Name	Type	Description	Default
`collection_name`	`str`	The name of the ChromaDB collection.	required
`em_invoker`	`BaseEMInvoker`	The embedding model to perform vectorization.	required
`client`	`ClientAPI`	The ChromaDB client instance.	required
`num_candidates`	`int`	Maximum number of candidates to consider during search. Defaults to 50.	`DEFAULT_NUM_CANDIDATES`
`encryption`	`EncryptionCapability \| None`	Encryption capability for field-level encryption. Defaults to None.	`None`

`em_invoker` `property`

Returns the EM Invoker instance.

Returns:

Name	Type	Description
`BaseEMInvoker`	`BaseEMInvoker`	The EM Invoker instance.

`clear()` `async`

Clear all records from the datastore.

`create(data, **kwargs)` `async`

Add chunks to the vector store with automatic embedding generation.

This method will automatically encrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration. When encryption is enabled, embeddings are generated from plaintext first, then chunks are encrypted, ensuring that embeddings represent the original content rather than encrypted ciphertext.

Examples:

await datastore.vector.create([
    Chunk(content="text1", metadata={"source": "source1"}, id="id1"),
    Chunk(content="text2", metadata={"source": "source2"}, id="id2"),
])

Parameters:

Name	Type	Description	Default
`data`	`Chunk \| list[Chunk]`	Single chunk or list of chunks to add.	required
`**kwargs`	`Any`	Backend-specific parameters.	`{}`

`create_from_vector(chunk_vectors, **kwargs)` `async`

Add pre-computed embeddings directly.

This method will automatically encrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration.

Examples:

await datastore.vector.create_from_vector(chunk_vectors=[
    (Chunk(content="text1", metadata={"source": "source1"}, id="id1"), [0.1, 0.2, 0.3]),
    (Chunk(content="text2", metadata={"source": "source2"}, id="id2"), [0.4, 0.5, 0.6]),
])

Parameters:

Name	Type	Description	Default
`chunk_vectors`	`list[tuple[Chunk, Vector]]`	List of tuples containing chunks and their corresponding vectors.	required
`**kwargs`	`Any`	Datastore-specific parameters.	`{}`

`delete(filters=None, **kwargs)` `async`

Delete records from the datastore.

Examples:

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await datastore.vector.delete(filters=F.eq("metadata.category", "tech"))

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await datastore.vector.delete(filters=filters)

This will delete all chunks from the vector store that match the filters.

Parameters:

Name	Type	Description	Default
`filters`	`FilterClause \| QueryFilter \| None`	Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op).	`None`
`**kwargs`	`Any`	Datastore-specific parameters.	`{}`

`ensure_index()` `async`

Ensure ChromaDB collection exists, creating it if necessary.

This method is idempotent - if the collection already exists, it will return the existing collection. The collection is automatically created during initialization, but this method can be called explicitly to ensure it exists.

Raises:

Type	Description
`RuntimeError`	If collection creation fails.

`retrieve(query, filters=None, options=None)` `async`

Semantic search using text query converted to vector.

This method will automatically decrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration.

Warning

Filters cannot target encrypted fields. If you try to filter by an encrypted metadata field (e.g., filters=F.eq("metadata.secret", "val")), the filter will fail to match because the filter value is not encrypted but the stored data is. Always use non-encrypted fields in filters when working with encrypted data.

Examples:

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage - using non-encrypted field
await datastore.vector.retrieve(
    query="What is the capital of France?",
    filters=F.eq("id", "document_id")
)

# Multiple filters - using non-encrypted fields
filters = F.and_(F.eq("id", "doc1"), F.eq("id", "doc2"))
await datastore.vector.retrieve(query="What is the capital of France?", filters=filters)

This will retrieve the top 10 chunks by similarity score from the vector store that match the query and the filters. The chunks will be sorted by score in descending order.

Parameters:

Name	Type	Description	Default
`query`	`str`	Text query to embed and search for.	required
`filters`	`FilterClause \| QueryFilter \| None`	Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Options to apply to the search. Defaults to None.	`None`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: List of chunks ordered by relevance score.

`retrieve_by_vector(vector, filters=None, options=None)` `async`

Direct vector similarity search.

This method will automatically decrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration.

Warning

Filters cannot target encrypted fields. If you try to filter by an encrypted metadata field (e.g., filters=F.eq("metadata.secret", "val")), the filter will fail to match because the filter value is not encrypted but the stored data is. Always use non-encrypted fields in filters when working with encrypted data.

Examples:

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage - using non-encrypted field
await datastore.vector.retrieve_by_vector(
    vector=[0.1, 0.2, 0.3],
    filters=F.eq("id", "document_id")
)

# Multiple filters - using non-encrypted fields
filters = F.and_(F.eq("id", "doc1"), F.eq("id", "doc2"))
await datastore.vector.retrieve_by_vector(vector=[0.1, 0.2, 0.3], filters=filters)

This will retrieve the top 10 chunks by similarity score from the vector store that match the vector and the filters. The chunks will be sorted by score in descending order.

Parameters:

Name	Type	Description	Default
`vector`	`Vector`	Query embedding vector.	required
`filters`	`FilterClause \| QueryFilter \| None`	Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Options to apply to the search. Defaults to None.	`None`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: List of chunks ordered by similarity score.

`update(update_values, filters=None)` `async`

Update existing records in the datastore.

This method will automatically encrypt the content and metadata in update_values if encryption is enabled following the encryption configuration.

Warning

Filters cannot target encrypted fields. While update_values are encrypted before being written, the filters used to identify which documents to update are NOT encrypted. If you try to update documents based on an encrypted metadata field (e.g., filters=F.eq("metadata.secret", "val")), the filter will fail to match because the filter value is not encrypted but the stored data is. Always use non-encrypted fields (like 'id') in filters when working with encrypted data.

Examples:

from gllm_datastore.core.filters import filter as F

# Update content - using non-encrypted field for filter
await datastore.vector.update(
    update_values={"content": "new content"},
    filters=F.eq("id", "document_id"),
)

# Update metadata - using non-encrypted field for filter
await datastore.vector.update(
    update_values={"metadata": {"status": "published"}},
    filters=F.eq("id", "document_id"),
)

This will update the chunks that match the filters with the new values.

Parameters:

Name	Type	Description	Default
`update_values`	`dict[str, Any]`	Values to update.	required
`filters`	`FilterClause \| QueryFilter \| None`	Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None, in which case no operation is performed (no-op).	`None`

Note

ChromaDB doesn't support direct update operations. This method requires filters to identify records and will update matching records.

Vector

ChromaVectorCapability(collection_name, em_invoker, client, num_candidates=DEFAULT_NUM_CANDIDATES, encryption=None)

em_invoker property

clear() async

create(data, **kwargs) async

create_from_vector(chunk_vectors, **kwargs) async

delete(filters=None, **kwargs) async

ensure_index() async

retrieve(query, filters=None, options=None) async

retrieve_by_vector(vector, filters=None, options=None) async

update(update_values, filters=None) async

`ChromaVectorCapability(collection_name, em_invoker, client, num_candidates=DEFAULT_NUM_CANDIDATES, encryption=None)`

`em_invoker` `property`

`clear()` `async`

`create(data, **kwargs)` `async`

`create_from_vector(chunk_vectors, **kwargs)` `async`

`delete(filters=None, **kwargs)` `async`

`ensure_index()` `async`

`retrieve(query, filters=None, options=None)` `async`

`retrieve_by_vector(vector, filters=None, options=None)` `async`

`update(update_values, filters=None)` `async`