Hybrid
Elasticsearch implementation of hybrid search capability.
This module provides hybrid search combining fulltext (BM25) and vector (dense) retrieval using weighted score boosting. Scores are combined as a weighted sum of BM25 and vector similarity scores.
ElasticsearchHybridCapability(index_name, client, config, encryption=None, default_batch_size=None)
Bases: ElasticLikeCore, BaseHybridCapability
Elasticsearch implementation of HybridCapability using weighted score boosting.
Combines fulltext (BM25) and vector (dense) retrieval in a single query. Final score = sum(weight_i * score_i) for each configured search type.
Initialize the Elasticsearch hybrid capability.
Examples:
from gllm_datastore.data_store.elasticsearch.hybrid import ElasticsearchHybridCapability
from gllm_datastore.core.capabilities.hybrid_capability import HybridSearchType, SearchConfig
config = [
SearchConfig(HybridSearchType.FULLTEXT, field="text", weight=0.3),
SearchConfig(HybridSearchType.VECTOR, field="embedding", weight=0.7, em_invoker=em_invoker),
]
capability = ElasticsearchHybridCapability(
index_name="my_index",
client=es_client,
config=config,
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_name
|
str
|
The name of the Elasticsearch index. |
required |
client
|
AsyncElasticsearch | AsyncOpenSearch
|
Elastic-like async client used by inherited hybrid operations. |
required |
config
|
list[SearchConfig]
|
List of search configurations (FULLTEXT and/or VECTOR). |
required |
encryption
|
EncryptionCapability | None
|
Encryption capability. Defaults to None. |
None
|
default_batch_size
|
int | None
|
Default batch size. Defaults to None. |
None
|
fulltext_configs
property
Return configs for FULLTEXT search type.
vector_configs
property
Return configs for VECTOR search type.
create(chunks, batch_size=None, **kwargs)
async
Create chunks with automatic generation of all configured search fields.
Overrides BaseHybridCapability.create to perform enrichment (embedding generation) before encryption and storage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
chunks
|
list[Chunk]
|
List of chunks to create and index. |
required |
batch_size
|
int | None
|
Batch size for creation. Defaults to None. |
None
|
**kwargs
|
Any
|
Extra arguments passed through to the Elasticsearch bulk API. |
{}
|
update(update_values, filters=None, batch_size=None, **kwargs)
async
Update existing records in the datastore.
- Vector configs: For each VECTOR config whose field is being updated, generate embeddings from the plaintext source (content/text/fulltext field). Embeddings must be produced before encryption so the model sees plaintext.
- Encryption: Encrypt all update values (content and metadata) using the encryption config. Vector fields are not encrypted; only content and metadata are. This step runs after enrichment so we never embed from encrypted text.
Examples:
from gllm_datastore.core.filters import filter as F
await hybrid_capability.update(
update_values={"text": "Updated content"},
filters=F.eq("metadata.source", "doc1"),
)
await hybrid_capability.update(
update_values={"metadata": {"status": "published"}},
filters=F.eq("id", "chunk_1"),
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values
|
dict[str, Any]
|
Fields to update (e.g. text, content, metadata). When text or content is present, vector fields from the hybrid config are re-embedded and included automatically. |
required |
filters
|
FilterClause | QueryFilter | None
|
Filter to select which records to update. No filter means no documents match. Defaults to None. |
None
|
batch_size
|
int | None
|
Batch size for update operations. Defaults to None. |
None
|
**kwargs
|
Any
|
Extra arguments passed through to the Elasticsearch update_by_query API (e.g. refresh, timeout). No default. |
{}
|