Data store
ChromaDB data store with capability composition.
ChromaClientType
Bases: StrEnum
Enum for different types of ChromaDB clients.
ChromaDataStore(collection_name, client_type=ChromaClientType.MEMORY, persist_directory=None, host=None, port=None, headers=None, client_settings=None)
Bases: BaseDataStore
ChromaDB data store with multiple capability support.
Attributes:
| Name | Type | Description |
|---|---|---|
collection_name |
str
|
The name of the ChromaDB collection. |
client |
ClientAPI
|
The ChromaDB client instance. |
Initialize the ChromaDB data store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str
|
The name of the ChromaDB collection. |
required |
client_type
|
ChromaClientType
|
Type of ChromaDB client to use. Defaults to ChromaClientType.MEMORY. |
MEMORY
|
persist_directory
|
str | None
|
Directory to persist vector store data. Required for PERSISTENT client type. Defaults to None. |
None
|
host
|
str | None
|
Host address for ChromaDB server. Required for HTTP client type. Defaults to None. |
None
|
port
|
int | None
|
Port for ChromaDB server. Required for HTTP client type. Defaults to None. |
None
|
headers
|
dict | None
|
A dictionary of headers to send to the Chroma server. Used for authentication with the Chroma server for HTTP client type. Defaults to None. |
None
|
client_settings
|
dict | None
|
A dictionary of additional settings for the Chroma client. Defaults to None. |
None
|
fulltext
property
Access fulltext capability if supported.
This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the ChromaFulltextCapability handler for better type hinting.
Returns:
| Name | Type | Description |
|---|---|---|
ChromaFulltextCapability |
ChromaFulltextCapability
|
Fulltext capability handler. |
Raises:
| Type | Description |
|---|---|
NotSupportedException
|
If fulltext capability is not supported. |
supported_capabilities
property
Return list of currently supported capabilities.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of capability names that are supported. |
vector
property
Access vector capability if supported.
This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the ChromaVectorCapability handler for better type hinting.
Returns:
| Name | Type | Description |
|---|---|---|
ChromaVectorCapability |
ChromaVectorCapability
|
Vector capability handler. |
Raises:
| Type | Description |
|---|---|
NotSupportedException
|
If vector capability is not supported. |
get_size(filters=None)
async
Get the total number of records in the datastore.
Examples:
1) Basic usage (no filters):
python
# Async usage
count = await datastore.get_size()
2) With filters (using Query Filters):
python
from gllm_datastore.core.filters import filter as F
count = await datastore.get_size(filters=F.eq("metadata.status", "active"))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The total number of records matching the filters. |
translate_query_filter(query_filter)
classmethod
Translate QueryFilter or FilterClause to ChromaDB native filter syntax.
This method uses ChromaQueryTranslator to translate filters and returns the result as a dictionary.
Examples:
-
Translate a simple FilterClause: ```python from gllm_datastore.core.filters import filter as F
filter_clause = F.eq("metadata.status", "active") result = ChromaDataStore.translate_query_filter(filter_clause)
result -> {"where": {"status": "active"}}
```
-
Translate QueryFilter with metadata filters: ```python from gllm_datastore.core.filters import filter as F
filters = F.and_( F.eq("metadata.category", "tech"), F.gte("metadata.price", 10), ) result = ChromaDataStore.translate_query_filter(filters)
result ->
{
"where": {
"$and": [
{"category": "tech"},
{"price": {"$gte": 10}}
]
}
}
```
-
Translate QueryFilter with content filters: ```python from gllm_datastore.core.filters import filter as F
filters = F.text_contains("content", "python") result = ChromaDataStore.translate_query_filter(filters)
result -> {"where_document": {"$contains": "python"}}
```
-
Translate QueryFilter with id filters: ```python from gllm_datastore.core.filters import filter as F
filters = F.in_("id", ["chunk_1", "chunk_2"]) result = ChromaDataStore.translate_query_filter(filters)
result -> {"ids": ["chunk_1", "chunk_2"]}
```
-
Translate complex nested QueryFilter: ```python from gllm_datastore.core.filters import filter as F
filters = F.and_( F.or_( F.eq("metadata.status", "active"), F.eq("metadata.status", "pending"), ), F.text_contains("content", "machine learning"), F.in_("id", ["chunk_1", "chunk_2"]), ) result = ChromaDataStore.translate_query_filter(filters)
result ->
{
"where": {
"$or": [
{"status": "active"},
{"status": "pending"}
]
},
"where_document": {"$contains": "machine learning"},
"ids": ["chunk_1", "chunk_2"]
}
```
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_filter
|
FilterClause | QueryFilter
|
The filter to translate. Can be a single FilterClause or a QueryFilter with multiple clauses. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
dict[str, Any] | None: The translated filter as a ChromaDB query dict. |
with_encryption(encryptor, fields)
Enable encryption for specified fields.
Note
Encrypted fields (content and metadata fields specified in encryption configuration) must be serializable to strings. Non-string values will be converted to strings before encryption.
Warning
When encryption is enabled for fields, some search and filter operations may be limited or broken. Encrypted fields cannot be used in filters for update or delete operations, as the filter values are not encrypted and will not match the encrypted data stored in the collection. Use non-encrypted fields (like 'id') for filtering when working with encrypted data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encryptor
|
BaseEncryptor
|
The encryptor instance to use. Must not be None. |
required |
fields
|
set[str] | list[str]
|
Set or list of field names to encrypt. Must not be empty. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ChromaDataStore |
ChromaDataStore
|
Self for method chaining. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If encryptor is None or fields is empty. |
with_fulltext(collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)
Configure fulltext capability and return datastore instance.
This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str | None
|
Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized. |
None
|
num_candidates
|
int
|
Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES. |
DEFAULT_NUM_CANDIDATES
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
ChromaDataStore
|
Self for method chaining. |
with_vector(em_invoker, collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)
Configure vector capability and return datastore instance.
This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
em_invoker
|
BaseEMInvoker
|
The embedding model to perform vectorization. |
required |
collection_name
|
str | None
|
Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized. |
None
|
num_candidates
|
int
|
Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES. |
DEFAULT_NUM_CANDIDATES
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
ChromaDataStore
|
Self for method chaining. |