Skip to content

Hybrid

Elasticsearch implementation of hybrid search capability.

This module provides hybrid search combining fulltext (BM25) and vector (dense) retrieval using weighted score boosting. Scores are combined as a weighted sum of BM25 and vector similarity scores.

ElasticsearchHybridCapability(index_name, client, config, encryption=None, default_batch_size=None)

Bases: ElasticLikeCore, BaseHybridCapability

Elasticsearch implementation of HybridCapability using weighted score boosting.

Combines fulltext (BM25) and vector (dense) retrieval in a single query. Final score = sum(weight_i * score_i) for each configured search type.

Initialize the Elasticsearch hybrid capability.

Examples:

from gllm_datastore.data_store.elasticsearch.hybrid import ElasticsearchHybridCapability
from gllm_datastore.core.capabilities.hybrid_capability import HybridSearchType, SearchConfig

config = [
    SearchConfig(HybridSearchType.FULLTEXT, field="text", weight=0.3),
    SearchConfig(HybridSearchType.VECTOR, field="embedding", weight=0.7, em_invoker=em_invoker),
]
capability = ElasticsearchHybridCapability(
    index_name="my_index",
    client=es_client,
    config=config,
)

Parameters:

Name Type Description Default
index_name str

The name of the Elasticsearch index.

required
client AsyncElasticsearch | AsyncOpenSearch

Elastic-like async client used by inherited hybrid operations.

required
config list[SearchConfig]

List of search configurations (FULLTEXT and/or VECTOR).

required
encryption EncryptionCapability | None

Encryption capability. Defaults to None.

None
default_batch_size int | None

Default batch size. Defaults to None.

None

fulltext_configs property

Return configs for FULLTEXT search type.

vector_configs property

Return configs for VECTOR search type.

create(chunks, batch_size=None, **kwargs) async

Create chunks with automatic generation of all configured search fields.

Overrides BaseHybridCapability.create to perform enrichment (embedding generation) before encryption and storage.

Parameters:

Name Type Description Default
chunks list[Chunk]

List of chunks to create and index.

required
batch_size int | None

Batch size for creation. Defaults to None.

None
**kwargs Any

Extra arguments passed through to the Elasticsearch bulk API.

{}

update(update_values, filters=None, batch_size=None, **kwargs) async

Update existing records in the datastore.

  1. Vector configs: For each VECTOR config whose field is being updated, generate embeddings from the plaintext source (content/text/fulltext field). Embeddings must be produced before encryption so the model sees plaintext.
  2. Encryption: Encrypt all update values (content and metadata) using the encryption config. Vector fields are not encrypted; only content and metadata are. This step runs after enrichment so we never embed from encrypted text.

Examples:

from gllm_datastore.core.filters import filter as F

await hybrid_capability.update(
    update_values={"text": "Updated content"},
    filters=F.eq("metadata.source", "doc1"),
)
await hybrid_capability.update(
    update_values={"metadata": {"status": "published"}},
    filters=F.eq("id", "chunk_1"),
)

Parameters:

Name Type Description Default
update_values dict[str, Any]

Fields to update (e.g. text, content, metadata). When text or content is present, vector fields from the hybrid config are re-embedded and included automatically.

required
filters FilterClause | QueryFilter | None

Filter to select which records to update. No filter means no documents match. Defaults to None.

None
batch_size int | None

Batch size for update operations. Defaults to None.

None
**kwargs Any

Extra arguments passed through to the Elasticsearch update_by_query API (e.g. refresh, timeout). No default.

{}