Skip to content

Data store

ChromaDB data store with capability composition.

ChromaClientType

Bases: StrEnum

Enum for different types of ChromaDB clients.

ChromaDataStore(collection_name, client_type=ChromaClientType.MEMORY, persist_directory=None, host=None, port=None, headers=None, client_settings=None)

Bases: BaseDataStore

ChromaDB data store with multiple capability support.

Attributes:

Name Type Description
collection_name str

The name of the ChromaDB collection.

client ClientAPI

The ChromaDB client instance.

Initialize the ChromaDB data store.

Parameters:

Name Type Description Default
collection_name str

The name of the ChromaDB collection.

required
client_type ChromaClientType

Type of ChromaDB client to use. Defaults to ChromaClientType.MEMORY.

MEMORY
persist_directory str | None

Directory to persist vector store data. Required for PERSISTENT client type. Defaults to None.

None
host str | None

Host address for ChromaDB server. Required for HTTP client type. Defaults to None.

None
port int | None

Port for ChromaDB server. Required for HTTP client type. Defaults to None.

None
headers dict | None

A dictionary of headers to send to the Chroma server. Used for authentication with the Chroma server for HTTP client type. Defaults to None.

None
client_settings dict | None

A dictionary of additional settings for the Chroma client. Defaults to None.

None

fulltext property

Access fulltext capability if supported.

This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the ChromaFulltextCapability handler for better type hinting.

Returns:

Name Type Description
ChromaFulltextCapability ChromaFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotSupportedException

If fulltext capability is not supported.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[str]

list[str]: List of capability names that are supported.

vector property

Access vector capability if supported.

This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the ChromaVectorCapability handler for better type hinting.

Returns:

Name Type Description
ChromaVectorCapability ChromaVectorCapability

Vector capability handler.

Raises:

Type Description
NotSupportedException

If vector capability is not supported.

get_size(filters=None) async

Get the total number of records in the datastore.

Examples:

1) Basic usage (no filters): python # Async usage count = await datastore.get_size()

2) With filters (using Query Filters): python from gllm_datastore.core.filters import filter as F count = await datastore.get_size(filters=F.eq("metadata.status", "active"))

Parameters:

Name Type Description Default
filters FilterClause | QueryFilter | None

Query filters to apply. Defaults to None.

None

Returns:

Name Type Description
int int

The total number of records matching the filters.

translate_query_filter(query_filter) classmethod

Translate QueryFilter or FilterClause to ChromaDB native filter syntax.

This method uses ChromaQueryTranslator to translate filters and returns the result as a dictionary.

Examples:

  1. Translate a simple FilterClause: ```python from gllm_datastore.core.filters import filter as F

    filter_clause = F.eq("metadata.status", "active") result = ChromaDataStore.translate_query_filter(filter_clause)

    result -> {"where": {"status": "active"}}

    ```

  2. Translate QueryFilter with metadata filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.and_( F.eq("metadata.category", "tech"), F.gte("metadata.price", 10), ) result = ChromaDataStore.translate_query_filter(filters)

    result ->

    {

    "where": {

    "$and": [

    {"category": "tech"},

    {"price": {"$gte": 10}}

    ]

    }

    }

    ```

  3. Translate QueryFilter with content filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.text_contains("content", "python") result = ChromaDataStore.translate_query_filter(filters)

    result -> {"where_document": {"$contains": "python"}}

    ```

  4. Translate QueryFilter with id filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.in_("id", ["chunk_1", "chunk_2"]) result = ChromaDataStore.translate_query_filter(filters)

    result -> {"ids": ["chunk_1", "chunk_2"]}

    ```

  5. Translate complex nested QueryFilter: ```python from gllm_datastore.core.filters import filter as F

    filters = F.and_( F.or_( F.eq("metadata.status", "active"), F.eq("metadata.status", "pending"), ), F.text_contains("content", "machine learning"), F.in_("id", ["chunk_1", "chunk_2"]), ) result = ChromaDataStore.translate_query_filter(filters)

    result ->

    {

    "where": {

    "$or": [

    {"status": "active"},

    {"status": "pending"}

    ]

    },

    "where_document": {"$contains": "machine learning"},

    "ids": ["chunk_1", "chunk_2"]

    }

    ```

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter

The filter to translate. Can be a single FilterClause or a QueryFilter with multiple clauses.

required

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The translated filter as a ChromaDB query dict.

with_encryption(encryptor, fields)

Enable encryption for specified fields.

Note

Encrypted fields (content and metadata fields specified in encryption configuration) must be serializable to strings. Non-string values will be converted to strings before encryption.

Warning

When encryption is enabled for fields, some search and filter operations may be limited or broken. Encrypted fields cannot be used in filters for update or delete operations, as the filter values are not encrypted and will not match the encrypted data stored in the collection. Use non-encrypted fields (like 'id') for filtering when working with encrypted data.

Parameters:

Name Type Description Default
encryptor BaseEncryptor

The encryptor instance to use. Must not be None.

required
fields set[str] | list[str]

Set or list of field names to encrypt. Must not be empty.

required

Returns:

Name Type Description
ChromaDataStore ChromaDataStore

Self for method chaining.

Raises:

Type Description
ValueError

If encryptor is None or fields is empty.

with_fulltext(collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)

Configure fulltext capability and return datastore instance.

This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
collection_name str | None

Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized.

None
num_candidates int

Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.

DEFAULT_NUM_CANDIDATES

Returns:

Name Type Description
Self ChromaDataStore

Self for method chaining.

with_vector(em_invoker, collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)

Configure vector capability and return datastore instance.

This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
collection_name str | None

Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized.

None
num_candidates int

Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.

DEFAULT_NUM_CANDIDATES

Returns:

Name Type Description
Self ChromaDataStore

Self for method chaining.