Skip to content

Fulltext

ChromaDB implementation of fulltext search and CRUD capability.

This module provides a ChromaDB implementation of the FulltextCapability protocol using ChromaDB's text search capabilities.

Authors

Kadek Denaya (kadek.d.r.diana@gdplabs.id)

References

NONE

ChromaFulltextCapability(collection_name, client, num_candidates=DEFAULT_NUM_CANDIDATES)

ChromaDB implementation of FulltextCapability protocol.

This class provides document CRUD operations and text search using ChromaDB.

Attributes:

Name Type Description
collection_name str

The name of the ChromaDB collection.

client ClientAPI

ChromaDB client instance.

collection

ChromaDB collection instance.

num_candidates int

Maximum number of candidates to consider during search.

Initialize the ChromaDB fulltext capability.

Parameters:

Name Type Description Default
collection_name str

The name of the ChromaDB collection.

required
client ClientAPI

ChromaDB client instance.

required
num_candidates int

Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.

DEFAULT_NUM_CANDIDATES

clear() async

Clear all records from the datastore.

create(data, **kwargs) async

Create new records in the datastore.

Parameters:

Name Type Description Default
data Chunk | list[Chunk]

Data to create (single item or collection).

required
**kwargs Any

Backend-specific parameters.

{}

Raises:

Type Description
ValueError

If data structure is invalid.

delete(filters=None, options=None) async

Delete records from the datastore.

Parameters:

Name Type Description Default
filters FilterClause | QueryFilter | None

Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op).

None
options QueryOptions | None

Query options for sorting and limiting deletions. Defaults to None.

None

get_size()

Returns the total number of documents in the collection.

Returns:

Name Type Description
int int

The total number of documents.

retrieve(filters=None, options=None, **kwargs) async

Read records from the datastore with optional filtering.

Usage Example
from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
results = await fulltext_capability.retrieve(filters=F.eq("metadata.category", "tech"))

# Multiple filters
results = await fulltext_capability.retrieve(
    filters=F.and_(F.eq("metadata.category", "tech"), F.eq("metadata.status", "active"))
)

Parameters:

Name Type Description Default
filters FilterClause | QueryFilter | None

Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Query options (sorting, pagination, etc.). Defaults to None.

None
**kwargs Any

Backend-specific parameters.

{}

Returns:

Type Description
list[Chunk]

list[Chunk]: Query results.

Raises:

Type Description
NotImplementedError

If unsupported operators are used for id or content filters.

retrieve_fuzzy(query, max_distance=2, filters=None, options=None, **kwargs) async

Find records that fuzzy match the query within distance threshold.

Parameters:

Name Type Description Default
query str

Text to fuzzy match against.

required
max_distance int

Maximum edit distance for matches. Defaults to 2.

2
filters FilterClause | QueryFilter | None

Optional metadata filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Query options (sorting, limit, etc.). Defaults to None.

None
**kwargs Any

Backend-specific parameters.

{}

Returns:

Type Description
list[Chunk]

list[Chunk]: Matched chunks ordered by distance (ascending) or by options.order_by if specified.

update(update_values, filters=None) async

Update existing records in the datastore.

Examples:

Update the content and metadata of the chunk with the id "unique_id" to "updated_content" and "published" respectively.

from gllm_datastore.core.filters import filter as F

await fulltext_capability.update(
    update_values={"content": "updated_content", "metadata": {"status": "published"}},
    filters=F.eq("id", "unique_id"),
)

Parameters:

Name Type Description Default
update_values dict[str, Any]

Values to update. Supports "content" for updating document content and "metadata" for updating metadata. Other keys are treated as direct metadata updates.

required
filters FilterClause | QueryFilter | None

Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
Note

ChromaDB doesn't support direct update operations. This method will retrieve matching records, update them, and re-add them to the collection.