Vector
Milvus implementation of vector similarity search capability.
This module provides a Milvus implementation of the VectorCapability protocol for vector-based semantic similarity search operations.
MilvusVectorCapability(collection_name, client, em_invoker, dimension, distance_metric='L2', vector_field='dense_vector', id_max_length=100, content_max_length=65535)
Milvus implementation of VectorCapability protocol.
This class provides vector similarity search operations using Milvus.
Attributes:
| Name | Type | Description |
|---|---|---|
collection_name |
str
|
The name of the Milvus collection. |
client |
AsyncMilvusClient
|
Async Milvus client instance. |
em_invoker |
BaseEMInvoker
|
Embedding model invoker for vectorization. |
dimension |
int
|
Vector dimension. |
distance_metric |
str
|
Distance metric ("L2", "IP", "COSINE"). |
vector_field |
str
|
Field name for dense vectors. |
id_max_length |
int
|
Maximum length for ID field. |
content_max_length |
int
|
Maximum length for content field. |
Initialize the Milvus vector capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
collection_name
|
str
|
The name of the Milvus collection. |
required |
client
|
AsyncMilvusClient
|
The async Milvus client instance. |
required |
em_invoker
|
BaseEMInvoker
|
The embedding model invoker. |
required |
dimension
|
int
|
Vector dimension. |
required |
distance_metric
|
str
|
Distance metric. Defaults to "L2". Supported: "L2", "IP", "COSINE". |
'L2'
|
vector_field
|
str
|
Field name for dense vectors. Defaults to "dense_vector". |
'dense_vector'
|
id_max_length
|
int
|
Maximum length for ID field. Defaults to 100. |
100
|
content_max_length
|
int
|
Maximum length for content field. Defaults to 65535. |
65535
|
em_invoker
property
Returns the EM Invoker instance.
Returns:
| Name | Type | Description |
|---|---|---|
BaseEMInvoker |
BaseEMInvoker
|
The EM Invoker instance. |
clear()
async
Clear all records from the datastore.
create(data, **kwargs)
async
Add chunks to the vector store with automatic embedding generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Chunk | list[Chunk]
|
Single chunk or list of chunks to add. |
required |
**kwargs
|
Any
|
Backend-specific parameters (e.g., partition_name). |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If vector dimension mismatch occurs. |
create_from_vector(chunk_vectors, **kwargs)
async
Add pre-computed vectors directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_vectors
|
list[tuple[Chunk, Vector]]
|
List of tuples containing chunks and their corresponding vectors. |
required |
**kwargs
|
Any
|
Backend-specific parameters (e.g., partition_name). |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If vector dimension mismatch occurs. |
delete(filters=None, **kwargs)
async
Delete records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
**kwargs
|
Any
|
Backend-specific parameters. |
{}
|
Note
If filters is None, no operation is performed (no-op).
ensure_index(index_type='IVF_FLAT', index_params=None, query_field='content', **kwargs)
async
Ensure collection and vector index exist, creating them if necessary.
This method is idempotent - if the collection and index already exist, it will skip creation and return early. Uses a lock to prevent race conditions when called concurrently from multiple coroutines.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_type
|
str
|
Index type. Defaults to "IVF_FLAT". Supported: "IVF_FLAT", "HNSW". |
'IVF_FLAT'
|
index_params
|
dict[str, Any] | None
|
Index-specific parameters. Defaults to None, in which case default parameters are used. |
None
|
query_field
|
str
|
Field name for text content. Defaults to "content". |
'content'
|
**kwargs
|
Any
|
Additional parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If collection or index creation fails. |
retrieve(query, filters=None, options=None, **kwargs)
async
Read records from the datastore using text-based similarity search with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Input text to embed and search with. |
required |
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options like limit and sorting. Defaults to None. |
None
|
**kwargs
|
Any
|
Backend-specific parameters (e.g., search_params). |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Query results ordered by similarity score. |
retrieve_by_vector(vector, filters=None, options=None, search_params=None, **kwargs)
async
Direct vector similarity search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
Vector
|
Query embedding vector. |
required |
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options like limit and sorting. Defaults to None. |
None
|
search_params
|
dict[str, Any] | None
|
Search parameters for Milvus. If None, default search parameters based on distance metric will be used. Defaults to None. |
None
|
**kwargs
|
Any
|
Backend-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by similarity score. |
update(update_values, filters=None, **kwargs)
async
Update existing records in the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Values to update. Supports content for updating document content and metadata for updating metadata. If content is updated, vectors are re-embedded. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs
|
Any
|
Backend-specific parameters (e.g., partition_name). |
{}
|
Note
If filters is None, no operation is performed (no-op).