Skip to content

Data store

ChromaDB data store with capability composition.

Authors

Kadek Denaya (kadek.d.r.diana@gdplabs.id)

References

NONE

ChromaClientType

Bases: StrEnum

Enum for different types of ChromaDB clients.

ChromaDataStore(collection_name, client_type=ChromaClientType.MEMORY, persist_directory=None, host=None, port=None, headers=None, client_settings=None)

Bases: BaseDataStore

ChromaDB data store with multiple capability support.

Attributes:

Name Type Description
collection_name str

The name of the ChromaDB collection.

client ClientAPI

The ChromaDB client instance.

Initialize the ChromaDB data store.

Parameters:

Name Type Description Default
collection_name str

The name of the ChromaDB collection.

required
client_type ChromaClientType

Type of ChromaDB client to use. Defaults to ChromaClientType.MEMORY.

MEMORY
persist_directory str | None

Directory to persist vector store data. Required for PERSISTENT client type. Defaults to None.

None
host str | None

Host address for ChromaDB server. Required for HTTP client type. Defaults to None.

None
port int | None

Port for ChromaDB server. Required for HTTP client type. Defaults to None.

None
headers dict | None

A dictionary of headers to send to the Chroma server. Used for authentication with the Chroma server for HTTP client type. Defaults to None.

None
client_settings dict | None

A dictionary of additional settings for the Chroma client. Defaults to None.

None

fulltext property

Access fulltext capability if supported.

This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the ChromaFulltextCapability handler for better type hinting.

Returns:

Name Type Description
ChromaFulltextCapability ChromaFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotSupportedException

If fulltext capability is not supported.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[str]

list[str]: List of capability names that are supported.

vector property

Access vector capability if supported.

This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the ChromaVectorCapability handler for better type hinting.

Returns:

Name Type Description
ChromaVectorCapability ChromaVectorCapability

Vector capability handler.

Raises:

Type Description
NotSupportedException

If vector capability is not supported.

translate_query_filter(query_filter=None) classmethod

Translate QueryFilter or FilterClause to ChromaDB native filter syntax.

This method uses ChromaQueryTranslator to translate filters and returns the result as a dictionary.

Examples:

  1. Translate a simple FilterClause: ```python from gllm_datastore.core.filters import filter as F

    filter_clause = F.eq("metadata.status", "active") result = ChromaDataStore.translate_query_filter(filter_clause)

    result -> {"where": {"status": "active"}}

    ```

  2. Translate QueryFilter with metadata filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.and_( F.eq("metadata.category", "tech"), F.gte("metadata.price", 10), ) result = ChromaDataStore.translate_query_filter(filters)

    result ->

    {

    "where": {

    "$and": [

    {"category": "tech"},

    {"price": {"$gte": 10}}

    ]

    }

    }

    ```

  3. Translate QueryFilter with content filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.text_contains("content", "python") result = ChromaDataStore.translate_query_filter(filters)

    result -> {"where_document": {"$contains": "python"}}

    ```

  4. Translate QueryFilter with id filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.in_("id", ["chunk_1", "chunk_2"]) result = ChromaDataStore.translate_query_filter(filters)

    result -> {"ids": ["chunk_1", "chunk_2"]}

    ```

  5. Translate complex nested QueryFilter: ```python from gllm_datastore.core.filters import filter as F

    filters = F.and_( F.or_( F.eq("metadata.status", "active"), F.eq("metadata.status", "pending"), ), F.text_contains("content", "machine learning"), F.in_("id", ["chunk_1", "chunk_2"]), ) result = ChromaDataStore.translate_query_filter(filters)

    result ->

    {

    "where": {

    "$or": [

    {"status": "active"},

    {"status": "pending"}

    ]

    },

    "where_document": {"$contains": "machine learning"},

    "ids": ["chunk_1", "chunk_2"]

    }

    ```

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter | None

The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses. Defaults to None.

None

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The translated filter as a ChromaDB query dict. Returns None for empty filters.

with_fulltext(collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)

Configure fulltext capability and return datastore instance.

This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
collection_name str | None

Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized.

None
num_candidates int

Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.

DEFAULT_NUM_CANDIDATES

Returns:

Name Type Description
Self ChromaDataStore

Self for method chaining.

with_vector(em_invoker, collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)

Configure vector capability and return datastore instance.

This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
collection_name str | None

Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized.

None
num_candidates int

Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.

DEFAULT_NUM_CANDIDATES

Returns:

Name Type Description
Self ChromaDataStore

Self for method chaining.