Data store
Milvus vector database integration with capability-based data store.
This module provides a data store implementation for Milvus, a high-performance, scalable vector database designed for similarity search and analytics on unstructured data. The current implementation supports fulltext search capability for document CRUD operations and flexible querying.
The MilvusDataStore class extends BaseDataStore to provide capability-based access to Milvus collections, enabling document CRUD operations and flexible querying through the fulltext capability.
MilvusDataStore(uri, collection_name, token=None, timeout=30, id_max_length=DEFAULT_ID_MAX_LENGTH, content_max_length=DEFAULT_CONTENT_MAX_LENGTH)
Bases: BaseDataStore
Milvus data store with multiple capability support.
Attributes:
| Name | Type | Description |
|---|---|---|
uri |
str
|
Milvus connection URI. |
token |
str | None
|
Authentication token for Milvus Cloud. |
collection_name |
str
|
The name of the Milvus collection. |
timeout |
int
|
Connection timeout in seconds. |
client |
AsyncMilvusClient
|
The async Milvus client instance. |
Initialize the Milvus data store.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
uri
|
str
|
Milvus connection URI (e.g., "http://localhost:19530"). |
required |
collection_name
|
str
|
Collection name. |
required |
token
|
str | None
|
Authentication token for Milvus Cloud. Defaults to None. |
None
|
timeout
|
int
|
Connection timeout in seconds. Defaults to 30. |
30
|
id_max_length
|
int
|
Maximum length for ID field. Defaults to 100. |
DEFAULT_ID_MAX_LENGTH
|
content_max_length
|
int
|
Maximum length for content field. Defaults to 65535. |
DEFAULT_CONTENT_MAX_LENGTH
|
fulltext
property
Access fulltext capability if supported.
This method uses the logic of its parent class to return the fulltext capability handler. This method overrides the parent class to return the MilvusFulltextCapability handler for better type hinting.
Returns:
| Name | Type | Description |
|---|---|---|
MilvusFulltextCapability |
MilvusFulltextCapability
|
Fulltext capability handler. |
Raises:
| Type | Description |
|---|---|
NotSupportedException
|
If fulltext capability is not supported. |
supported_capabilities
property
Return list of currently supported capabilities.
Returns:
| Type | Description |
|---|---|
list[CapabilityType]
|
list[CapabilityType]: List of capability names that are supported. |
vector
property
Access vector capability if supported.
This method uses the logic of its parent class to return the vector capability handler. This method overrides the parent class to return the MilvusVectorCapability handler for better type hinting.
Returns:
| Name | Type | Description |
|---|---|---|
MilvusVectorCapability |
MilvusVectorCapability
|
Vector capability handler. |
Raises:
| Type | Description |
|---|---|
NotSupportedException
|
If vector capability is not supported. |
get_size(filters=None)
async
Get the total number of records in the datastore.
Examples:
1) Basic usage (no filters):
python
count = await datastore.get_size()
2) With filters (using Query Filters):
python
from gllm_datastore.core.filters import filter as F
count = await datastore.get_size(filters=F.eq("metadata.status", "active"))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The total number of records matching the filters. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the operation fails. |
translate_query_filter(query_filter=None)
classmethod
Translate QueryFilter or FilterClause to Milvus expression syntax.
This method uses MilvusQueryTranslator to translate filters and returns the result as a Milvus expression string.
Examples:
-
Translate a simple FilterClause: ```python from gllm_datastore.core.filters import filter as F
filter_clause = F.eq("metadata.status", "active") result = MilvusDataStore.translate_query_filter(filter_clause)
result -> 'metadata["status"] == "active"'
```
-
Translate QueryFilter with metadata filters: ```python from gllm_datastore.core.filters import filter as F
filters = F.and_( F.eq("metadata.category", "tech"), F.gte("metadata.price", 100), ) result = MilvusDataStore.translate_query_filter(filters)
result -> '(metadata["category"] == "tech" and metadata["price"] >= 100)'
```
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_filter
|
FilterClause | QueryFilter | None
|
The filter to translate. Can be a single FilterClause, a QueryFilter with multiple clauses. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
str | None
|
str | None: The translated filter as a Milvus expression string. Returns None for empty filters. |
with_fulltext(collection_name=None, query_field='content')
Configure fulltext capability and return datastore instance.
This method uses the logic of its parent class to configure the fulltext capability. This method overrides the parent class for better type hinting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str | None
|
Override collection name. Defaults to None, in which case the default class attribute will be utilized. |
None
|
query_field
|
str
|
Field name for text content. Defaults to "content". |
'content'
|
Returns:
| Name | Type | Description |
|---|---|---|
MilvusDataStore |
MilvusDataStore
|
Self for method chaining. |
with_vector(em_invoker, collection_name=None, dimension=None, distance_metric='L2', vector_field='dense_vector', index_type='IVF_FLAT', index_params=None, query_field='content')
Configure vector capability and return datastore instance.
This method uses the logic of its parent class to configure the vector capability. This method overrides the parent class for better type hinting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
em_invoker
|
BaseEMInvoker
|
Embedding model invoker (required). |
required |
collection_name
|
str | None
|
Override collection name. Defaults to None, in which case the default class attribute will be utilized. |
None
|
dimension
|
int | None
|
Vector dimension. Required if collection doesn't exist. |
None
|
distance_metric
|
str
|
Distance metric. Defaults to "L2". Supported: "L2", "IP", "COSINE". |
'L2'
|
vector_field
|
str
|
Field name for dense vectors. Defaults to "dense_vector". |
'dense_vector'
|
index_type
|
str
|
Index type. Defaults to "IVF_FLAT". Supported: "IVF_FLAT", "HNSW". |
'IVF_FLAT'
|
index_params
|
dict[str, Any] | None
|
Index-specific parameters. Defaults to None, in which case default parameters are used. |
None
|
query_field
|
str
|
Field name for text content. Defaults to "content". |
'content'
|
Returns:
| Name | Type | Description |
|---|---|---|
MilvusDataStore |
MilvusDataStore
|
Self for method chaining. |