Fulltext
ChromaDB implementation of fulltext search and CRUD capability.
This module provides a ChromaDB implementation of the FulltextCapability protocol using ChromaDB's text search capabilities.
References
NONE
ChromaFulltextCapability(collection_name, client, num_candidates=DEFAULT_NUM_CANDIDATES)
ChromaDB implementation of FulltextCapability protocol.
This class provides document CRUD operations and text search using ChromaDB.
Attributes:
| Name | Type | Description |
|---|---|---|
collection_name |
str
|
The name of the ChromaDB collection. |
client |
ClientAPI
|
ChromaDB client instance. |
collection |
ChromaDB collection instance. |
|
num_candidates |
int
|
Maximum number of candidates to consider during search. |
Initialize the ChromaDB fulltext capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str
|
The name of the ChromaDB collection. |
required |
client
|
ClientAPI
|
ChromaDB client instance. |
required |
num_candidates
|
int
|
Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES. |
DEFAULT_NUM_CANDIDATES
|
clear()
async
Clear all records from the datastore.
create(data, **kwargs)
async
delete(filters=None, options=None)
async
Delete records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
options
|
QueryOptions | None
|
Query options for sorting and limiting deletions. Defaults to None. |
None
|
get_size()
Returns the total number of documents in the collection.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The total number of documents. |
retrieve(filters=None, options=None, **kwargs)
async
Read records from the datastore with optional filtering.
Usage Example
from gllm_datastore.core.filters import filter as F
# Direct FilterClause usage
results = await fulltext_capability.retrieve(filters=F.eq("metadata.category", "tech"))
# Multiple filters
results = await fulltext_capability.retrieve(
filters=F.and_(F.eq("metadata.category", "tech"), F.eq("metadata.status", "active"))
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options (sorting, pagination, etc.). Defaults to None. |
None
|
**kwargs
|
Any
|
Backend-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Query results. |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
If unsupported operators are used for id or content filters. |
retrieve_fuzzy(query, max_distance=2, filters=None, options=None, **kwargs)
async
Find records that fuzzy match the query within distance threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Text to fuzzy match against. |
required |
max_distance
|
int
|
Maximum edit distance for matches. Defaults to 2. |
2
|
filters
|
FilterClause | QueryFilter | None
|
Optional metadata filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options (sorting, limit, etc.). Defaults to None. |
None
|
**kwargs
|
Any
|
Backend-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Matched chunks ordered by distance (ascending) or by options.order_by if specified. |
update(update_values, filters=None)
async
Update existing records in the datastore.
Examples:
Update the content and metadata of the chunk with the id "unique_id" to "updated_content" and "published" respectively.
from gllm_datastore.core.filters import filter as F
await fulltext_capability.update(
update_values={"content": "updated_content", "metadata": {"status": "published"}},
filters=F.eq("id", "unique_id"),
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Values to update. Supports "content" for updating document content and "metadata" for updating metadata. Other keys are treated as direct metadata updates. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
Note
ChromaDB doesn't support direct update operations. This method will retrieve matching records, update them, and re-add them to the collection.