Skip to content

Data store

Milvus vector database integration with capability-based data store.

This module provides a data store implementation for Milvus, a high-performance, scalable vector database designed for similarity search and analytics on unstructured data. The current implementation supports fulltext search capability for document CRUD operations and flexible querying.

The MilvusDataStore class extends BaseDataStore to provide capability-based access to Milvus collections, enabling document CRUD operations and flexible querying through the fulltext capability.

MilvusDataStore(uri, collection_name, token=None, timeout=30, id_max_length=DEFAULT_ID_MAX_LENGTH, content_max_length=DEFAULT_CONTENT_MAX_LENGTH)

Bases: BaseDataStore

Milvus data store with multiple capability support.

Attributes:

Name Type Description
uri str

Milvus connection URI.

token str | None

Authentication token for Milvus Cloud.

collection_name str

The name of the Milvus collection.

timeout int

Connection timeout in seconds.

client AsyncMilvusClient

The async Milvus client instance.

Initialize the Milvus data store.

Parameters:

Name Type Description Default
uri str

Milvus connection URI (e.g., "http://localhost:19530").

required
collection_name str

Collection name.

required
token str | None

Authentication token for Milvus Cloud. Defaults to None.

None
timeout int

Connection timeout in seconds. Defaults to 30.

30
id_max_length int

Maximum length for ID field. Defaults to 100.

DEFAULT_ID_MAX_LENGTH
content_max_length int

Maximum length for content field. Defaults to 65535.

DEFAULT_CONTENT_MAX_LENGTH

fulltext property

Access fulltext capability if supported.

This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the MilvusFulltextCapability handler for better type hinting.

Returns:

Name Type Description
MilvusFulltextCapability MilvusFulltextCapability

Fulltext capability handler.

Raises:

Type Description
NotSupportedException

If fulltext capability is not supported.

supported_capabilities property

Return list of currently supported capabilities.

Returns:

Type Description
list[CapabilityType]

list[CapabilityType]: List of capability names that are supported.

vector property

Access vector capability if supported.

This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the MilvusVectorCapability handler for better type hinting.

Returns:

Name Type Description
MilvusVectorCapability MilvusVectorCapability

Vector capability handler.

Raises:

Type Description
NotSupportedException

If vector capability is not supported.

get_size(filters=None) async

Get the total number of records in the datastore.

Examples:

1) Basic usage (no filters): python count = await datastore.get_size()

2) With filters (using Query Filters): python from gllm_datastore.core.filters import filter as F count = await datastore.get_size(filters=F.eq("metadata.status", "active"))

Parameters:

Name Type Description Default
filters FilterClause | QueryFilter | None

Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None

Returns:

Name Type Description
int int

The total number of records matching the filters.

Raises:

Type Description
RuntimeError

If the operation fails.

translate_query_filter(query_filter=None) classmethod

Translate QueryFilter or FilterClause to Milvus expression syntax.

This method uses MilvusQueryTranslator to translate filters and returns the result as a Milvus expression string.

Examples:

  1. Translate a simple FilterClause: ```python from gllm_datastore.core.filters import filter as F

    filter_clause = F.eq("metadata.status", "active") result = MilvusDataStore.translate_query_filter(filter_clause)

    result -> 'metadata["status"] == "active"'

    ```

  2. Translate QueryFilter with metadata filters: ```python from gllm_datastore.core.filters import filter as F

    filters = F.and_( F.eq("metadata.category", "tech"), F.gte("metadata.price", 100), ) result = MilvusDataStore.translate_query_filter(filters)

    result -> '(metadata["category"] == "tech" and metadata["price"] >= 100)'

    ```

Parameters:

Name Type Description Default
query_filter FilterClause | QueryFilter | None

The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses. Defaults to None.

None

Returns:

Type Description
str | None

str | None: The translated filter as a Milvus expression string. Returns None for empty filters.

with_fulltext(collection_name=None, query_field='content')

Configure fulltext capability and return datastore instance.

This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
collection_name str | None

Override collection name. Defaults to None, in which case the default class attribute will be utilized.

None
query_field str

Field name for text content. Defaults to "content".

'content'

Returns:

Name Type Description
MilvusDataStore MilvusDataStore

Self for method chaining.

with_vector(em_invoker, collection_name=None, dimension=None, distance_metric='L2', vector_field='dense_vector', index_type='IVF_FLAT', index_params=None, query_field='content')

Configure vector capability and return datastore instance.

This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.

Parameters:

Name Type Description Default
em_invoker BaseEMInvoker

Embedding model invoker (required).

required
collection_name str | None

Override collection name. Defaults to None, in which case the default class attribute will be utilized.

None
dimension int | None

Vector dimension. Required if collection doesn't exist.

None
distance_metric str

Distance metric. Defaults to "L2". Supported: "L2", "IP", "COSINE".

'L2'
vector_field str

Field name for dense vectors. Defaults to "dense_vector".

'dense_vector'
index_type str

Index type. Defaults to "IVF_FLAT". Supported: "IVF_FLAT", "HNSW".

'IVF_FLAT'
index_params dict[str, Any] | None

Index-specific parameters. Defaults to None, in which case default parameters are used.

None
query_field str

Field name for text content. Defaults to "content".

'content'

Returns:

Name Type Description
MilvusDataStore MilvusDataStore

Self for method chaining.