Data store
ChromaDB data store with capability composition.
References
NONE
ChromaClientType
Bases: StrEnum
Enum for different types of ChromaDB clients.
ChromaDataStore(collection_name, client_type=ChromaClientType.MEMORY, persist_directory=None, host=None, port=None, headers=None, client_settings=None)
Bases: BaseDataStore
ChromaDB data store with multiple capability support.
Attributes:
| Name | Type | Description |
|---|---|---|
collection_name |
str
|
The name of the ChromaDB collection. |
client |
ClientAPI
|
The ChromaDB client instance. |
Initialize the ChromaDB data store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str
|
The name of the ChromaDB collection. |
required |
client_type
|
ChromaClientType
|
Type of ChromaDB client to use. Defaults to ChromaClientType.MEMORY. |
MEMORY
|
persist_directory
|
str | None
|
Directory to persist vector store data. Required for PERSISTENT client type. Defaults to None. |
None
|
host
|
str | None
|
Host address for ChromaDB server. Required for HTTP client type. Defaults to None. |
None
|
port
|
int | None
|
Port for ChromaDB server. Required for HTTP client type. Defaults to None. |
None
|
headers
|
dict | None
|
A dictionary of headers to send to the Chroma server. Used for authentication with the Chroma server for HTTP client type. Defaults to None. |
None
|
client_settings
|
dict | None
|
A dictionary of additional settings for the Chroma client. Defaults to None. |
None
|
fulltext
property
Access fulltext capability if supported.
This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the ChromaFulltextCapability handler for better type hinting.
Returns:
| Name | Type | Description |
|---|---|---|
ChromaFulltextCapability |
ChromaFulltextCapability
|
Fulltext capability handler. |
Raises:
| Type | Description |
|---|---|
NotSupportedException
|
If fulltext capability is not supported. |
supported_capabilities
property
Return list of currently supported capabilities.
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: List of capability names that are supported. |
vector
property
Access vector capability if supported.
This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the ChromaVectorCapability handler for better type hinting.
Returns:
| Name | Type | Description |
|---|---|---|
ChromaVectorCapability |
ChromaVectorCapability
|
Vector capability handler. |
Raises:
| Type | Description |
|---|---|
NotSupportedException
|
If vector capability is not supported. |
translate_query_filter(query_filter=None)
classmethod
Translate QueryFilter or FilterClause to ChromaDB native filter syntax.
This method uses ChromaQueryTranslator to translate filters and returns the result as a dictionary.
Examples:
-
Translate a simple FilterClause: ```python from gllm_datastore.core.filters import filter as F
filter_clause = F.eq("metadata.status", "active") result = ChromaDataStore.translate_query_filter(filter_clause)
result -> {"where": {"status": "active"}}
```
-
Translate QueryFilter with metadata filters: ```python from gllm_datastore.core.filters import filter as F
filters = F.and_( F.eq("metadata.category", "tech"), F.gte("metadata.price", 10), ) result = ChromaDataStore.translate_query_filter(filters)
result ->
{
"where": {
"$and": [
{"category": "tech"},
{"price": {"$gte": 10}}
]
}
}
```
-
Translate QueryFilter with content filters: ```python from gllm_datastore.core.filters import filter as F
filters = F.text_contains("content", "python") result = ChromaDataStore.translate_query_filter(filters)
result -> {"where_document": {"$contains": "python"}}
```
-
Translate QueryFilter with id filters: ```python from gllm_datastore.core.filters import filter as F
filters = F.in_("id", ["chunk_1", "chunk_2"]) result = ChromaDataStore.translate_query_filter(filters)
result -> {"ids": ["chunk_1", "chunk_2"]}
```
-
Translate complex nested QueryFilter: ```python from gllm_datastore.core.filters import filter as F
filters = F.and_( F.or_( F.eq("metadata.status", "active"), F.eq("metadata.status", "pending"), ), F.text_contains("content", "machine learning"), F.in_("id", ["chunk_1", "chunk_2"]), ) result = ChromaDataStore.translate_query_filter(filters)
result ->
{
"where": {
"$or": [
{"status": "active"},
{"status": "pending"}
]
},
"where_document": {"$contains": "machine learning"},
"ids": ["chunk_1", "chunk_2"]
}
```
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_filter
|
FilterClause | QueryFilter | None
|
The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, Any] | None
|
dict[str, Any] | None: The translated filter as a ChromaDB query dict. Returns None for empty filters. |
with_fulltext(collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)
Configure fulltext capability and return datastore instance.
This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str | None
|
Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized. |
None
|
num_candidates
|
int
|
Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES. |
DEFAULT_NUM_CANDIDATES
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
ChromaDataStore
|
Self for method chaining. |
with_vector(em_invoker, collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)
Configure vector capability and return datastore instance.
This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
em_invoker
|
BaseEMInvoker
|
The embedding model to perform vectorization. |
required |
collection_name
|
str | None
|
Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized. |
None
|
num_candidates
|
int
|
Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES. |
DEFAULT_NUM_CANDIDATES
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
ChromaDataStore
|
Self for method chaining. |