Capabilities
Capability protocols for datastore interface.
This package defines the core capability protocols that all datastores can implement. Each protocol represents a specific set of functionality that datastores can opt-in to provide.
EncryptionCapability(encryptor, encrypted_fields)
Unified implementation of encryption capability.
This class provides the shared encryption and decryption logic that is identical across all backend implementations. It handles: - Chunk content and metadata encryption/decryption - Preparation of encrypted chunks with plaintext embeddings - Encryption of update values
Thread Safety
This class is designed to be thread-safe when used with thread-safe encryptors. The encryptor instance passed must be thread-safe for concurrent encryption/decryption operations. Methods in this class do not perform internal synchronization - thread safety is delegated to the underlying encryptor.
Attributes:
| Name | Type | Description |
|---|---|---|
encryptor |
BaseEncryptor
|
The encryptor instance to use for encryption/decryption. Must be thread-safe for concurrent operations. |
_encrypted_fields |
set[str]
|
The set of fields to encrypt. |
Initialize the encryption capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
encryptor
|
BaseEncryptor
|
The encryptor instance to use for encryption. |
required |
encrypted_fields
|
set[str]
|
The set of fields to encrypt. Supports:
1. Content field: "content"
2. Metadata fields using dot notation: "metadata.secret_key", "metadata.secret_value"
Example: |
required |
encryption_config
property
Get the current encryption configuration.
Returns:
| Type | Description |
|---|---|
set[str]
|
set[str]: Set of encrypted field names. |
is_enabled
property
Check if encryption is enabled (has configured fields).
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if encryption fields are configured, False otherwise. |
decrypt_chunks(chunks)
encrypt_chunks(chunks)
encrypt_embedded_chunks(chunks, em_invoker)
async
Encrypt chunks and generate embeddings from plaintext before encryption.
Generates embeddings from plaintext content to ensure embeddings represent the original content rather than encrypted ciphertext. This is used when encryption is enabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
List of chunks to encrypt and generate embeddings for. |
required |
em_invoker
|
BaseEMInvoker
|
Embedding model invoker to generate embeddings. |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[Chunk, Vector]]
|
list[tuple[Chunk, Vector]]: List of tuples containing encrypted chunks and their corresponding vectors generated from plaintext. |
encrypt_update_values(update_values, content_field_name=CHUNK_KEYS.CONTENT)
Encrypt update values if encryption is enabled.
This method encrypts content and metadata values in update_values according to the encryption configuration. It handles type conversion for non-string values before encryption.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Dictionary of values to encrypt. Supports "content" and "metadata" keys. |
required |
content_field_name
|
str
|
The field name to use for content in the output. Defaults to CHUNK_KEYS.CONTENT. Useful for datastores like Elasticsearch that use "text" instead of "content". |
CONTENT
|
Returns:
| Type | Description |
|---|---|
dict[str, Any]
|
dict[str, Any]: Dictionary with encrypted values where applicable. The "content" key is mapped to content_field_name in the output. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If encryption fails for any field. |
FulltextCapability
Bases: Protocol
Protocol for full-text search and document operations.
This protocol defines the interface for datastores that support CRUD operations and flexible querying mechanisms for document data.
clear(**kwargs)
async
Clear all records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Datastore-specific parameters. |
{}
|
create(data, **kwargs)
async
delete(filters=None, options=None, **kwargs)
async
Delete records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None, in which case no operation is performed (no-op). |
None
|
options
|
QueryOptions | None
|
Query options for sorting and limiting deletions. Defaults to None. |
None
|
**kwargs
|
Datastore-specific parameters. |
{}
|
retrieve(filters=None, options=None, **kwargs)
async
Read records from the datastore with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options like limit and sorting. Defaults to None. |
None
|
**kwargs
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Query results. |
retrieve_fuzzy(query, max_distance=2, filters=None, options=None, **kwargs)
async
Find records that fuzzy match the query within distance threshold.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Text to fuzzy match against. |
required |
max_distance
|
int
|
Maximum edit distance for matches (Levenshtein distance). Defaults to 2. |
2
|
filters
|
FilterClause | QueryFilter | None
|
Optional metadata filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options (limit, sorting, etc.). Defaults to None. |
None
|
**kwargs
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Matched chunks ordered by relevance/distance. |
update(update_values, filters=None, **kwargs)
async
Update existing records in the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Values to update. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs
|
Datastore-specific parameters. |
{}
|
GraphCapability
Bases: Protocol
Protocol for graph database operations.
This protocol defines the interface for datastores that support graph-based data operations. This includes node and relationship management as well as graph queries.
delete_node(label, identifier_key, identifier_value)
async
Delete a node and its relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
label
|
str
|
Node label/type. |
required |
identifier_key
|
str
|
Node identifier key. |
required |
identifier_value
|
str
|
Node identifier value. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
Deletion result information. |
delete_relationship(node_source_key, node_source_value, relation, node_target_key, node_target_value)
async
Delete a relationship between nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_source_key
|
str
|
Source node identifier key. |
required |
node_source_value
|
str
|
Source node identifier value. |
required |
relation
|
str
|
Relationship type. |
required |
node_target_key
|
str
|
Target node identifier key. |
required |
node_target_value
|
str
|
Target node identifier value. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
Deletion result information. |
retrieve(query, parameters=None)
async
Retrieve data from the graph with specific query.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Query to retrieve data from the graph. |
required |
parameters
|
dict[str, Any] | None
|
Query parameters. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: Query results as list of dictionaries. |
upsert_node(label, identifier_key, identifier_value, properties=None)
async
Create or update a node in the graph.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
label
|
str
|
Node label/type. |
required |
identifier_key
|
str
|
Key field for node identification. |
required |
identifier_value
|
str
|
Value for node identification. |
required |
properties
|
dict[str, Any] | None
|
Additional node properties. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
Created/updated node information. |
upsert_relationship(node_source_key, node_source_value, relation, node_target_key, node_target_value, properties=None)
async
Create or update a relationship between nodes.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
node_source_key
|
str
|
Source node identifier key. |
required |
node_source_value
|
str
|
Source node identifier value. |
required |
relation
|
str
|
Relationship type. |
required |
node_target_key
|
str
|
Target node identifier key. |
required |
node_target_value
|
str
|
Target node identifier value. |
required |
properties
|
dict[str, Any] | None
|
Relationship properties. Defaults to None. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
Created/updated relationship information. |
HybridCapability
Bases: Protocol
Protocol for hybrid search combining different retrieval paradigms.
This protocol defines the interface for datastores that support hybrid search operations combining multiple retrieval strategies (fulltext, vector).
clear(**kwargs)
async
Clear all records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
create(chunks, **kwargs)
async
Create chunks with automatic generation of all configured search fields.
This method automatically generates and indexes all fields required by the configured searches in with_hybrid(). For each chunk:
- FULLTEXT search: Indexes text content in the configured field name.
- VECTOR search: Generates dense embedding using the configured em_invoker and indexes it in the configured field name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
List of chunks to create and index. |
required |
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
create_from_vector(chunks, dense_vectors=None, **kwargs)
async
Create chunks with pre-computed vectors for multiple fields.
Allows indexing pre-computed vectors for multiple vector fields at once. Field names must match those configured in with_hybrid().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
Chunks to index. |
required |
dense_vectors
|
dict[str, list[tuple[Chunk, Vector]]] | None
|
Dict mapping field names to lists of (chunk, vector) tuples. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
delete(filters=None, **kwargs)
async
Delete records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
Note
If filters is None, no operation is performed (no-op).
retrieve(query, filters=None, options=None, **kwargs)
async
Retrieve using hybrid search combining different retrieval paradigms.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Query text to search with. |
required |
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options like limit and sorting. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Query results ordered by relevance. |
retrieve_by_vector(query=None, dense_vectors=None, filters=None, options=None, **kwargs)
async
Hybrid search using pre-computed vectors.
Keys in dense_vectors must match VECTOR config field names; each field uses its own query vector (supports multi-vector retrieval).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str | None
|
Optional query text (for fulltext search). Defaults to None. |
None
|
dense_vectors
|
dict[str, Vector] | None
|
Map field name to query vector. Each VECTOR config uses dense_vectors[field]. Defaults to None. |
None
|
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options like limit and sorting. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Query results ordered by relevance. |
update(update_values, filters=None, **kwargs)
async
Update existing records in the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Values to update. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
HybridSearchType
Bases: StrEnum
Types of searches that can be combined in hybrid search.
SearchConfig
Bases: BaseModel
Configuration for a single search component in hybrid search.
Examples:
FULLTEXT search configuration:
python
config = SearchConfig(
search_type=HybridSearchType.FULLTEXT,
field="text",
weight=0.3
)
VECTOR search configuration:
python
config = SearchConfig(
search_type=HybridSearchType.VECTOR,
field="embedding",
em_invoker=em_invoker,
weight=0.5
)
Attributes:
| Name | Type | Description |
|---|---|---|
search_type |
HybridSearchType
|
Type of search (FULLTEXT or VECTOR). |
field |
str
|
Field name in the index (e.g., "text", "embedding"). |
weight |
float
|
Weight for this search in hybrid search. Defaults to 1.0. |
em_invoker |
BaseEMInvoker | None
|
Embedding model invoker required for VECTOR type. Defaults to None. |
top_k |
int | None
|
Per-search top_k limit (optional). Defaults to None. |
extra_kwargs |
dict[str, Any]
|
Additional search-specific parameters. Defaults to empty dict. |
validate_field_not_empty(v)
classmethod
Validate that field name is not empty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
str
|
Field name value. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Validated field name. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If field name is empty. |
validate_search_requirements()
Validate configuration based on search type.
Returns:
| Name | Type | Description |
|---|---|---|
SearchConfig |
'SearchConfig'
|
Validated configuration instance. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If required fields are missing for the search type. |
validate_top_k(v)
classmethod
Validate that top_k is positive if provided.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
int | None
|
top_k value. |
required |
Returns:
| Type | Description |
|---|---|
int | None
|
int | None: Validated top_k value. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If top_k is provided but not positive. |
VectorCapability
Bases: Protocol
Protocol for vector similarity search operations.
This protocol defines the interface for datastores that support vector-based retrieval operations. This includes similarity search, ID-based lookup as well as vector storage.
clear()
async
Clear all records from the datastore.
create(data)
async
create_from_vector(chunk_vectors, **kwargs)
async
Add pre-computed vectors directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunk_vectors
|
list[tuple[Chunk, Vector]]
|
List of tuples containing chunks and their corresponding vectors. |
required |
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
delete(filters=None, **kwargs)
async
Delete records from the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to delete. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters |
{}
|
Note
If filters is None, no operation is performed (no-op).
ensure_index(**kwargs)
async
Ensure vector index exists, creating it if necessary.
This method ensures that the vector index required for similarity search operations is created. If the index already exists, this method performs no operation (idempotent).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Datastore-specific parameters for index configuration. |
{}
|
retrieve(query, filters=None, options=None, **kwargs)
async
Read records from the datastore using text-based similarity search with optional filtering.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
Input text to embed and search with. |
required |
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options like limit and sorting. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: Query results. |
retrieve_by_vector(vector, filters=None, options=None, **kwargs)
async
Direct vector similarity search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vector
|
Vector
|
Query embedding vector. |
required |
filters
|
FilterClause | QueryFilter | None
|
Query filters to apply. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
options
|
QueryOptions | None
|
Query options like limit and sorting. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: List of chunks ordered by similarity score. |
update(update_values, filters=None, **kwargs)
async
Update existing records in the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Values to update. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None. |
None
|
**kwargs
|
Any
|
Datastore-specific parameters. |
{}
|