Skip to content

Data Store

Data store implementations organized by storage type.

ChromaDataStore(collection_name, client_type=ChromaClientType.MEMORY, persist_directory=None, host=None, port=None, headers=None, client_settings=None)

Bases: BaseDataStore

ChromaDB data store with multiple capability support.

Attributes:

Name Type Description
collection_name str

The name of the ChromaDB collection.

client ClientAPI

The ChromaDB client instance.

Initialize the ChromaDB data store.

Parameters:

Name Type Description Default
collection_name str

The name of the ChromaDB collection.

required
client_type ChromaClientType

Type of ChromaDB client to use. Defaults to ChromaClientType.MEMORY.

MEMORY
persist_directory str | None

Directory to persist vector store data. Required for PERSISTENT client type. Defaults to None.

None
host str | None

Host address for ChromaDB server. Required for HTTP client type. Defaults to None.

None
port int | None

Port for ChromaDB server. Required for HTTP client type. Defaults to None.

None
headers dict | None

A dictionary of headers to send to the Chroma server. Used for authentication with the Chroma server for HTTP client type. Defaults to None.

None
client_settings dict | None

A dictionary of additional settings for the Chroma client. Defaults to None.

None

fulltext property

Access fulltext capability if supported.

This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the ChromaFulltextCapability handler for better type hinting.

Returns:

Name Type Description
ChromaFulltextCapability ChromaFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotSupportedException

If fulltext capability is not supported.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[str]

list[str]: List of capability names that are supported.

vector property

Access vector capability if supported.

This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the ChromaVectorCapability handler for better type hinting.

Returns:

Name Type Description
ChromaVectorCapability ChromaVectorCapability

Vector capability handler.

Raises:

Type Description
NotSupportedException

If vector capability is not supported.

translate_query_filter(query_filter=None) classmethod

Translate QueryFilter or FilterClause to ChromaDB native filter syntax.

This method uses ChromaQueryTranslator to translate filters and returns the result as a dictionary.

Examples:

  1. Translate a simple FilterClause: ```python from gllm_datastore.core.filters import filter as F

    filter_clause = F.eq("metadata.status", "active") result = ChromaDataStore.translate_query_filter(filter_clause)

    result -> {"where": {"status": "active"}}

    ```

  2. Translate QueryFilter with metadata filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.and_( F.eq("metadata.category", "tech"), F.gte("metadata.price", 10), ) result = ChromaDataStore.translate_query_filter(filters)

    result ->

    {

    "where": {

    "$and": [

    {"category": "tech"},

    {"price": {"$gte": 10}}

    ]

    }

    }

    ```

  3. Translate QueryFilter with content filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.text_contains("content", "python") result = ChromaDataStore.translate_query_filter(filters)

    result -> {"where_document": {"$contains": "python"}}

    ```

  4. Translate QueryFilter with id filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.in_("id", ["chunk_1", "chunk_2"]) result = ChromaDataStore.translate_query_filter(filters)

    result -> {"ids": ["chunk_1", "chunk_2"]}

    ```

  5. Translate complex nested QueryFilter: ```python from gllm_datastore.core.filters import filter as F

    filters = F.and_( F.or_( F.eq("metadata.status", "active"), F.eq("metadata.status", "pending"), ), F.text_contains("content", "machine learning"), F.in_("id", ["chunk_1", "chunk_2"]), ) result = ChromaDataStore.translate_query_filter(filters)

    result ->

    {

    "where": {

    "$or": [

    {"status": "active"},

    {"status": "pending"}

    ]

    },

    "where_document": {"$contains": "machine learning"},

    "ids": ["chunk_1", "chunk_2"]

    }

    ```

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter | None

The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses. Defaults to None.

None

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The translated filter as a ChromaDB query dict. Returns None for empty filters.

with_fulltext(collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)

Configure fulltext capability and return datastore instance.

This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
collection_name str | None

Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized.

None
num_candidates int

Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.

DEFAULT_NUM_CANDIDATES

Returns:

Name Type Description
Self ChromaDataStore

Self for method chaining.

with_vector(em_invoker, collection_name=None, num_candidates=DEFAULT_NUM_CANDIDATES)

Configure vector capability and return datastore instance.

This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
collection_name str | None

Name of the collection to use in ChromaDB. Defaults to None, in which case the default class attribute will be utilized.

None
num_candidates int

Maximum number of candidates to consider during search. Defaults to DEFAULT_NUM_CANDIDATES.

DEFAULT_NUM_CANDIDATES

Returns:

Name Type Description
Self ChromaDataStore

Self for method chaining.

ElasticsearchDataStore(index_name, client=None, url=None, cloud_id=None, api_key=None, username=None, password=None, request_timeout=DEFAULT_REQUEST_TIMEOUT)

Bases: BaseDataStore

Elasticsearch data store with multiple capability support.

This is the explicit public API for Elasticsearch. Users know they're using Elasticsearch, not a generic "elastic-like" datastore.

Attributes:

Name Type Description
engine str

Always "elasticsearch" for explicit identification. This attribute ensures users know they're using Elasticsearch, not a generic "elastic-like" datastore.

index_name str

The name of the Elasticsearch index.

client AsyncElasticsearch

AsyncElasticsearch client.

Initialize the Elasticsearch data store.

Parameters:

Name Type Description Default
index_name str

The name of the Elasticsearch index to use for operations. This index name will be used for all queries and operations.

required
client AsyncElasticsearch | None

Pre-configured Elasticsearch client instance. If provided, it will be used instead of creating a new client from url/cloud_id. Must be an instance of AsyncElasticsearch. Defaults to None.

None
url str | None

The URL of the Elasticsearch server. For example, "http://localhost:9200". Either url or cloud_id must be provided if client is None. Defaults to None.

None
cloud_id str | None

The cloud ID of the Elasticsearch cluster. Used for Elastic Cloud connections. Either url or cloud_id must be provided if client is None. Defaults to None.

None
api_key str | None

The API key for authentication. If provided, will be used for authentication. Mutually exclusive with username/password. Defaults to None.

None
username str | None

The username for basic authentication. Must be provided together with password. Mutually exclusive with api_key. Defaults to None.

None
password str | None

The password for basic authentication. Must be provided together with username. Mutually exclusive with api_key. Defaults to None.

None
request_timeout int

The request timeout in seconds. Defaults to DEFAULT_REQUEST_TIMEOUT.

DEFAULT_REQUEST_TIMEOUT

Raises:

Type Description
ValueError

If neither url nor cloud_id is provided when client is None.

TypeError

If client is provided but is not an instance of AsyncElasticsearch.

fulltext property

Access fulltext capability if supported.

This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the ElasticsearchFulltextCapability handler for better type hinting.

Returns:

Name Type Description
ElasticsearchFulltextCapability ElasticsearchFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotSupportedException

If fulltext capability is not supported.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[str]

list[str]: List of capability names that are supported.

vector property

Access vector capability if supported.

This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the ElasticsearchVectorCapability handler for better type hinting.

Returns:

Name Type Description
ElasticsearchVectorCapability ElasticsearchVectorCapability

Vector capability handler.

Raises:

Type Description
NotSupportedException

If vector capability is not supported.

translate_query_filter(query_filter) classmethod

Translate QueryFilter or FilterClause to Elasticsearch native filter syntax.

This method delegates to the ElasticsearchQueryTranslator and returns the result as a dictionary.

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter | None

The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses and logical conditions, or None for empty filters. FilterClause objects are automatically converted to QueryFilter.

required

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The translated filter as an Elasticsearch DSL dictionary. Returns None for empty filters or when query_filter is None. The dictionary format matches Elasticsearch Query DSL syntax.

with_fulltext(index_name=None, query_field='text')

Configure fulltext capability and return datastore instance.

This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
index_name str | None

The name of the Elasticsearch index to use for fulltext operations. If None, uses the default index_name from the datastore instance. Defaults to None.

None
query_field str

The field name to use for text content in queries. This field will be used for BM25 and other text search operations. Defaults to "text".

'text'

Returns:

Name Type Description
ElasticsearchDataStore ElasticsearchDataStore

Self for method chaining.

with_vector(em_invoker, index_name=None, query_field='text', vector_query_field='vector', retrieval_strategy=None, distance_strategy=None)

Configure vector capability and return datastore instance.

This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
index_name str | None

The name of the Elasticsearch index. Defaults to None, in which case the default class attribute will be utilized.

None
query_field str

The field name for text queries. Defaults to "text".

'text'
vector_query_field str

The field name for vector queries. Defaults to "vector".

'vector'
retrieval_strategy AsyncRetrievalStrategy | None

The retrieval strategy for retrieval. Defaults to None, in which case DenseVectorStrategy() is used.

None
distance_strategy str | None

The distance strategy for retrieval. Defaults to None.

None

Returns:

Name Type Description
Self ElasticsearchDataStore

Self for method chaining.

InMemoryDataStore()

Bases: BaseDataStore

In-memory data store with multiple capability support.

This class provides a unified interface for accessing vector, fulltext, and cache capabilities using in-memory storage optimized for development and testing scenarios.

Attributes:

Name Type Description
store dict[str, Chunk]

Dictionary storing data with their IDs as keys.

Initialize the in-memory data store.

fulltext property

Access fulltext capability if registered.

This method solely uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the InMemoryFulltextCapability handler for better type hinting.

Returns:

Name Type Description
InMemoryFulltextCapability InMemoryFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotRegisteredException

If fulltext capability is not registered.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[CapabilityType]

list[str]: List of capability names that are supported.

vector property

Access vector capability if registered.

This method solely uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the InMemoryVectorCapability handler for better type hinting.

Returns:

Name Type Description
InMemoryVectorCapability InMemoryVectorCapability

Vector capability handler.

Raises:

Type Description
NotRegisteredException

If vector capability is not registered.

translate_query_filter(query_filter) classmethod

Translate QueryFilter or FilterClause to in-memory datastore filter syntax.

For the in-memory datastore, this method acts as an identity function since the datastore works directly with the QueryFilter DSL without requiring translation to a native format.

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter | None

The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses, or None for empty filters.

required

Returns:

Type Description
FilterClause | QueryFilter | None

FilterClause | QueryFilter | None: The same filter object that was passed in. Returns None for empty filters.

NotRegisteredException(capability, class_obj)

Bases: Exception

Raised when attempting to access a capability that is not registered.

This exception is raised when code attempts to access a capability that is not registered for a datastore but is supported by the datastore.

Initialize the exception.

Parameters:

Name Type Description Default
capability str

The name of the unregistered capability.

required
class_obj Type

The class object for context.

required

NotSupportedException(capability, class_obj)

Bases: Exception

Raised when attempting to access an unsupported capability.

This exception is raised when code attempts to access a capability that isn't configured for a datastore.

Initialize the exception.

Parameters:

Name Type Description Default
capability str

The name of the unsupported capability.

required
class_obj Type

The class object for context.

required

OpenSearchDataStore(index_name, client=None, url=None, cloud_id=None, api_key=None, username=None, password=None, request_timeout=DEFAULT_REQUEST_TIMEOUT)

Bases: BaseDataStore

OpenSearch data store with multiple capability support.

This is the explicit public API for OpenSearch. Users know they're using OpenSearch, not a generic "elastic-like" datastore.

Attributes:

Name Type Description
engine str

Always "opensearch" for explicit identification. This attribute ensures users know they're using OpenSearch, not a generic "elastic-like" datastore.

index_name str

The name of the OpenSearch index.

client AsyncOpenSearch

AsyncOpenSearch client.

Initialize the OpenSearch data store.

Parameters:

Name Type Description Default
index_name str

The name of the OpenSearch index to use for operations. This index name will be used for all queries and operations.

required
client AsyncOpenSearch | None

Pre-configured OpenSearch client instance. If provided, it will be used instead of creating a new client from url/cloud_id. Must be an instance of AsyncOpenSearch. Defaults to None.

None
url str | None

The URL of the OpenSearch server. For example, "http://localhost:9200". Either url or cloud_id must be provided if client is None. Defaults to None.

None
cloud_id str | None

The cloud ID of the OpenSearch cluster. Used for OpenSearch Service connections. Either url or cloud_id must be provided if client is None. Defaults to None.

None
api_key str | None

The API key for authentication. If provided, will be used for authentication. Mutually exclusive with username/password. Defaults to None.

None
username str | None

The username for basic authentication. Must be provided together with password. Mutually exclusive with api_key. Defaults to None.

None
password str | None

The password for basic authentication. Must be provided together with username. Mutually exclusive with api_key. Defaults to None.

None
request_timeout int

The request timeout in seconds. Defaults to DEFAULT_REQUEST_TIMEOUT.

DEFAULT_REQUEST_TIMEOUT

Raises:

Type Description
ValueError

If neither url nor cloud_id is provided when client is None.

TypeError

If client is provided but is not an instance of AsyncOpenSearch.

fulltext property

Access fulltext capability if supported.

This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the OpenSearchFulltextCapability handler for better type hinting.

Returns:

Name Type Description
OpenSearchFulltextCapability OpenSearchFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotSupportedException

If fulltext capability is not supported.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[str]

list[str]: List of capability names that are supported. Currently returns [CapabilityType.FULLTEXT, CapabilityType.VECTOR]. Note: Vector capability is not yet implemented.

vector property

Access vector capability if supported.

Raises:

Type Description
NotImplementedError

Vector capability not yet implemented.

translate_query_filter(query_filter) classmethod

Translate QueryFilter or FilterClause to OpenSearch native filter syntax.

This method delegates to the OpenSearchQueryTranslator and returns the result as a dictionary.

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter | None

The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses and logical conditions, or None for empty filters. FilterClause objects are automatically converted to QueryFilter.

required

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: The translated filter as an OpenSearch DSL dictionary. Returns None for empty filters or when query_filter is None. The dictionary format matches OpenSearch Query DSL syntax.

with_fulltext(index_name=None, query_field='text')

Configure fulltext capability and return datastore instance.

This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
index_name str | None

The name of the OpenSearch index to use for fulltext operations. If None, uses the default index_name from the datastore instance. Defaults to None.

None
query_field str

The field name to use for text content in queries. This field will be used for BM25 and other text search operations. Defaults to "text".

'text'

Returns:

Name Type Description
OpenSearchDataStore OpenSearchDataStore

Self for method chaining.

with_vector(*args, **kwargs)

Configure vector capability and return datastore instance.

Parameters:

Name Type Description Default
*args Any

Placeholder arguments.

()
**kwargs Any

Placeholder keyword arguments.

{}

Returns:

Name Type Description
Self OpenSearchDataStore

Self for method chaining.

Raises:

Type Description
NotImplementedError

Vector capability not yet implemented.

RedisDataStore(index_name, url=None, client=None)

Bases: BaseDataStore

Redis data store with fulltext capability support.

Attributes:

Name Type Description
index_name str

Name for the Redis index.

client Redis

Redis client instance.

Initialize the Redis data store.

Parameters:

Name Type Description Default
index_name str

Name of the Redis index to use.

required
url str | None

URL for Redis connection. Defaults to None. Format: redis://[[username]:[password]]@host:port/database

None
client Redis | None

Redis client instance to use. Defaults to None. in which case the url parameter will be used to create a new Redis client.

None

Raises:

Type Description
ValueError

If neither url nor client is provided, or if URL is invalid.

TypeError

If client is not a Redis instance.

ConnectionError

If Redis connection fails.

fulltext property

Access fulltext capability if registered.

This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the RedisFulltextCapability handler for better type hinting.

Returns:

Name Type Description
RedisFulltextCapability RedisFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotRegisteredException

If fulltext capability is not registered.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[CapabilityType]

list[CapabilityType]: List of capability names that are supported.

vector property

Access vector capability if registered.

This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the RedisVectorCapability handler for better type hinting.

Returns:

Name Type Description
RedisVectorCapability RedisVectorCapability

Vector capability handler.

Raises:

Type Description
NotRegisteredException

If vector capability is not registered.

translate_query_filter(query_filter=None)

Translate QueryFilter or FilterClause to Redis native filter syntax.

This method delegates to the existing RedisQueryTranslator in the redis.query_translator module and returns the result as a string. It uses the instance's index_name and client to detect field types for accurate Redis Search query syntax.

Examples:

from gllm_datastore.core.filters import filter as F

# Create datastore instance
datastore = RedisDataStore(index_name="my_index", url="redis://localhost:6379")

# Single FilterClause (field types detected from index schema)
clause = F.eq("metadata.status", "active")
result = datastore.translate_query_filter(clause)
# Returns: "@metadata_status:{active}" if status is a TAG field
# Returns: "@metadata_status:active" if status is a TEXT field

# QueryFilter with multiple clauses (AND condition)
filter_obj = F.and_(
    F.eq("metadata.status", "active"),
    F.gt("metadata.age", 25),
)
result = datastore.translate_query_filter(filter_obj)
# Returns: "@metadata_status:{active} @metadata_age:[(25 +inf]"

# QueryFilter with OR condition
filter_obj = F.or_(
    F.eq("metadata.status", "active"),
    F.eq("metadata.status", "pending"),
)
result = datastore.translate_query_filter(filter_obj)
# Returns: "@metadata_status:{active} | @metadata_status:{pending}"

# IN operator (produces parentheses syntax)
filter_obj = F.in_("metadata.category", ["tech", "science"])
result = datastore.translate_query_filter(filter_obj)
# Returns: "@metadata_category:(tech|science)"

# Empty filter returns None
result = datastore.translate_query_filter(None)
# Returns: None

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter | None

The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses. Defaults to None.

None

Returns:

Type Description
str | None

str | None: The translated filter as a Redis Search query string. Returns None if no filter is provided.

with_fulltext(index_name=None)

Configure fulltext capability and return datastore instance.

Schema will be automatically inferred from chunks when creating a new index, or auto-detected from an existing index when performing operations.

Parameters:

Name Type Description Default
index_name str | None

The name of the Redis index. Defaults to None, in which case the default class attribute will be utilized.

None

Returns:

Name Type Description
RedisDataStore RedisDataStore

RedisDataStore instance for method chaining.

with_vector(em_invoker, index_name=None)

Configure vector capability and return datastore instance.

Schema will be automatically inferred from chunks when creating a new index, or auto-detected from an existing index when performing operations.

Parameters:

Name Type Description Default
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
index_name str | None

The name of the Redis index. Defaults to None, in which case the default class attribute will be utilized.

None

Returns:

Name Type Description
RedisDataStore RedisDataStore

RedisDataStore instance for method chaining.

Raises:

Type Description
ValueError

If em_invoker is not provided.