Fulltext

ChromaDB implementation of fulltext search and CRUD capability.

This module provides a ChromaDB implementation of the FulltextCapability protocol using ChromaDB's text search capabilities.

`ChromaFulltextCapability(collection_name, client, num_candidates=DEFAULT_NUM_CANDIDATES, encryption=None)`

ChromaDB implementation of FulltextCapability protocol.

This class provides document CRUD operations and text search using ChromaDB.

Attributes:

Name	Type	Description
`collection_name`	`str`	The name of the ChromaDB collection.
`client`	`ClientAPI`	ChromaDB client instance.
`collection`		ChromaDB collection instance.
`num_candidates`	`int`	Maximum number of candidates to consider during search.

Initialize the ChromaDB fulltext capability.

Parameters:

Name	Type	Description	Default
`collection_name`	`str`	The name of the ChromaDB collection.	required
`client`	`ClientAPI`	ChromaDB client instance.	required
`num_candidates`	`int`	Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.	`DEFAULT_NUM_CANDIDATES`
`encryption`	`EncryptionCapability \| None`	Encryption capability for field-level encryption. Defaults to None.	`None`

`clear()` `async`

Clear all records from the datastore.

`create(data, **kwargs)` `async`

Create new records in the datastore.

This method will automatically encrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration.

Parameters:

Name	Type	Description	Default
`data`	`Chunk \| list[Chunk]`	Data to create (single item or collection).	required
`**kwargs`	`Any`	Backend-specific parameters.	`{}`

Raises:

Type	Description
`ValueError`	If data structure is invalid.

`delete(filters=None, options=None)` `async`

Delete records from the datastore.

Parameters:

Name	Type	Description	Default
`filters`	`FilterClause \| QueryFilter \| None`	Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op).	`None`
`options`	`QueryOptions \| None`	Query options for sorting and limiting deletions. Defaults to None.	`None`

`get_size()`

Returns the total number of documents in the collection.

Returns:

Name	Type	Description
`int`	`int`	The total number of documents.

`retrieve(filters=None, options=None, **kwargs)` `async`

Read records from the datastore with optional filtering.

Usage Example

from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
results = await fulltext_capability.retrieve(filters=F.eq("metadata.category", "tech"))

# Multiple filters
results = await fulltext_capability.retrieve(
    filters=F.and_(F.eq("metadata.category", "tech"), F.eq("metadata.status", "active"))
)

Parameters:

Name	Type	Description	Default
`filters`	`FilterClause \| QueryFilter \| None`	Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Query options (sorting, pagination, etc.). Defaults to None.	`None`
`**kwargs`	`Any`	Backend-specific parameters.	`{}`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: Query results.

Raises:

Type	Description
`NotImplementedError`	If unsupported operators are used for id or content filters.

`retrieve_fuzzy(query, max_distance=2, filters=None, options=None, **kwargs)` `async`

Find records that fuzzy match the query within distance threshold.

Parameters:

Name	Type	Description	Default
`query`	`str`	Text to fuzzy match against.	required
`max_distance`	`int`	Maximum edit distance for matches. Defaults to 2.	`2`
`filters`	`FilterClause \| QueryFilter \| None`	Optional metadata filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Query options (sorting, limit, etc.). Defaults to None.	`None`
`**kwargs`	`Any`	Backend-specific parameters.	`{}`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: Matched chunks ordered by distance (ascending) or by options.order_by if specified.

`update(update_values, filters=None)` `async`

Update existing records in the datastore.

This method will automatically encrypt the content and metadata in update_values if encryption is enabled following the encryption configuration.

Warning

Filters cannot target encrypted fields. While update_values are encrypted before being written, the filters used to identify which documents to update are NOT encrypted. If you try to update documents based on an encrypted metadata field (e.g., filters=F.eq("metadata.secret", "val")), the filter will fail to match because the filter value is not encrypted but the stored data is. Always use non-encrypted fields (like 'id') in filters when working with encrypted data.

Examples:

Update the content and metadata of the chunk with the id "unique_id" to "updated_content" and "published" respectively.

from gllm_datastore.core.filters import filter as F

await fulltext_capability.update(
    update_values={"content": "updated_content", "metadata": {"status": "published"}},
    filters=F.eq("id", "unique_id"),
)

Parameters:

Name	Type	Description	Default
`update_values`	`dict[str, Any]`	Values to update. Supports "content" for updating document content and "metadata" for updating metadata. Other keys are treated as direct metadata updates.	required
`filters`	`FilterClause \| QueryFilter \| None`	Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None.	`None`

Note

ChromaDB doesn't support direct update operations. This method will retrieve matching records, update them, and upsert them back to the collection.

Fulltext

ChromaFulltextCapability(collection_name, client, num_candidates=DEFAULT_NUM_CANDIDATES, encryption=None)

clear() async

create(data, **kwargs) async

delete(filters=None, options=None) async

get_size()

retrieve(filters=None, options=None, **kwargs) async

retrieve_fuzzy(query, max_distance=2, filters=None, options=None, **kwargs) async

update(update_values, filters=None) async

`ChromaFulltextCapability(collection_name, client, num_candidates=DEFAULT_NUM_CANDIDATES, encryption=None)`

`clear()` `async`

`create(data, **kwargs)` `async`

`delete(filters=None, options=None)` `async`

`get_size()`

`retrieve(filters=None, options=None, **kwargs)` `async`

`retrieve_fuzzy(query, max_distance=2, filters=None, options=None, **kwargs)` `async`

`update(update_values, filters=None)` `async`