Fulltext
ChromaDB implementation of fulltext search and CRUD capability.
This module provides a ChromaDB implementation of the FulltextCapability protocol using ChromaDB's text search capabilities.
ChromaFulltextCapability(collection_name, client, num_candidates=DEFAULT_NUM_CANDIDATES, encryption=None)
ChromaDB implementation of FulltextCapability protocol.
This class provides document CRUD operations and text search using ChromaDB.
Attributes:
| Name | Type | Description |
|---|---|---|
collection_name |
str
|
The name of the ChromaDB collection. |
client |
ClientAPI
|
ChromaDB client instance. |
collection |
ChromaDB collection instance. |
|
num_candidates |
int
|
Maximum number of candidates to consider during search. |
Initialize the ChromaDB fulltext capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str
|
The name of the ChromaDB collection. |
required |
client
|
ClientAPI
|
ChromaDB client instance. |
required |
num_candidates
|
int
|
Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES. |
DEFAULT_NUM_CANDIDATES
|
encryption
|
EncryptionCapability | None
|
Encryption capability for field-level encryption. Defaults to None. |
None
|
clear()
async
Clear all records from the datastore.
create(data, **kwargs)
async
Create new records in the datastore.
This method will automatically encrypt the content and metadata of the chunks if encryption is enabled following the encryption configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Chunk | list[Chunk]
|
Data to create (single item or collection). |
required |
**kwargs
|
Any
|
Backend-specific parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data structure is invalid. |
delete(filters=None, options=None)
async
Delete records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
options
|
QueryOptions | None
|
Query options for sorting and limiting deletions. Defaults to None. |
None
|
get_size()
Returns the total number of documents in the collection.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The total number of documents. |
retrieve(filters=None, options=None, **kwargs)
async
Read records from the datastore with optional filtering.
Usage Example
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
results = await fulltext_capability.retrieve(filters=F.eq("metadata.category", "tech"))
# Multiple filters
results = await fulltext_capability.retrieve(
filters=F.and_(F.eq("metadata.category", "tech"), F.eq("metadata.status", "active"))
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options (sorting, pagination, etc.). Defaults to None. |
None
|
**kwargs
|
Any
|
Backend-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Query results. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If unsupported operators are used for id or content filters. |
retrieve_fuzzy(query, max_distance=2, filters=None, options=None, **kwargs)
async
Find records that fuzzy match the query within distance threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Text to fuzzy match against. |
required |
max_distance
|
int
|
Maximum edit distance for matches. Defaults to 2. |
2
|
filters
|
FilterClause | QueryFilter | None
|
Optional metadata filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options (sorting, limit, etc.). Defaults to None. |
None
|
**kwargs
|
Any
|
Backend-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Matched chunks ordered by distance (ascending) or by options.order_by if specified. |
update(update_values, filters=None)
async
Update existing records in the datastore.
This method will automatically encrypt the content and metadata in update_values if encryption is enabled following the encryption configuration.
Warning
Filters cannot target encrypted fields. While update_values are encrypted before
being written, the filters used to identify which documents to update are NOT encrypted.
If you try to update documents based on an encrypted metadata field (e.g.,
filters=F.eq("metadata.secret", "val")), the filter will fail to match because
the filter value is not encrypted but the stored data is. Always use non-encrypted
fields (like 'id') in filters when working with encrypted data.
Examples:
Update the content and metadata of the chunk with the id "unique_id" to "updated_content" and "published" respectively.
from gllm_datastore.core.filters import filter as F
await fulltext_capability.update(
update_values={"content": "updated_content", "metadata": {"status": "published"}},
filters=F.eq("id", "unique_id"),
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Values to update. Supports "content" for updating document content and "metadata" for updating metadata. Other keys are treated as direct metadata updates. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Cannot use encrypted fields in filters. Defaults to None. |
None
|
Note
ChromaDB doesn't support direct update operations. This method will retrieve matching records, update them, and upsert them back to the collection.