Skip to content

Vector

Elasticsearch implementation of vector search and CRUD capability.

Authors

Kadek Denaya (kadek.d.r.diana@gdplabs.id)

References

NONE

ElasticsearchVectorCapability(index_name, client, em_invoker, query_field='text', vector_query_field='vector', retrieval_strategy=None, distance_strategy=None)

Elasticsearch implementation of VectorCapability protocol.

This class provides document CRUD operations and vector search using Elasticsearch.

Attributes:

Name Type Description
index_name str

The name of the Elasticsearch index.

vector_store AsyncElasticsearchStore

The vector store instance.

em_invoker BaseEMInvoker

The embedding model to perform vectorization.

Initialize the Elasticsearch vector capability.

Parameters:

Name Type Description Default
index_name str

The name of the Elasticsearch index.

required
client AsyncElasticsearch

The Elasticsearch client.

required
em_invoker BaseEMInvoker

The embedding model to perform vectorization.

required
query_field str

The field name for text queries. Defaults to "text".

'text'
vector_query_field str

The field name for vector queries. Defaults to "vector".

'vector'
retrieval_strategy AsyncRetrievalStrategy | None

The retrieval strategy for retrieval. Defaults to None, in which case DenseVectorStrategy() is used.

None
distance_strategy str | None

The distance strategy for retrieval. Defaults to None.

None

em_invoker property

Returns the EM Invoker instance.

Returns:

Name Type Description
BaseEMInvoker BaseEMInvoker

The EM Invoker instance.

clear(**kwargs) async

Clear all records from the datastore.

Parameters:

Name Type Description Default
**kwargs Any

Datastore-specific parameters.

{}

create(data, **kwargs) async

Create new records in the datastore.

Parameters:

Name Type Description Default
data Chunk | list[Chunk]

Data to create (single item or collection).

required
**kwargs Any

Datastore-specific parameters.

{}

Raises:

Type Description
ValueError

If data structure is invalid.

create_from_vector(chunk_vectors, **kwargs) async

Add pre-computed embeddings directly.

Parameters:

Name Type Description Default
chunk_vectors list[tuple[Chunk, Vector]]

List of tuples containing chunks and their corresponding vectors.

required
**kwargs

Datastore-specific parameters.

{}

Returns:

Type Description
list[str]

list[str]: List of IDs assigned to added embeddings.

delete(filters=None, **kwargs) async

Delete records from the data store based on filters.

Parameters:

Name Type Description Default
filters FilterClause | QueryFilter | None

Filters to select records for deletion. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
**kwargs Any

Datastore-specific parameters.

{}

delete_by_id(id, **kwargs) async

Delete records from the data store based on IDs.

Parameters:

Name Type Description Default
id str | list[str]

ID or list of IDs to delete.

required
**kwargs Any

Datastore-specific parameters.

{}

ensure_index(mapping=None, index_settings=None, dimension=None, distance_strategy=None) async

Ensure Elasticsearch index exists, creating it if necessary.

This method is idempotent - if the index already exists, it will skip creation and return early.

Parameters:

Name Type Description Default
mapping dict[str, Any] | None

Custom mapping dictionary to use for index creation. If provided, this mapping will be used directly. The mapping should follow Elasticsearch mapping format. Defaults to None, in which default mapping will be used.

None
index_settings dict[str, Any] | None

Custom index settings. These settings will be merged with any default settings. Defaults to None.

None
dimension int | None

Vector dimension. If not provided and mapping is not provided, will be inferred from em_invoker by generating a test embedding.

None
distance_strategy str | None

Distance strategy for vector similarity. Supported values: "cosine", "l2_norm", "dot_product", etc. Only used when building default mapping. Defaults to "cosine" if not specified.

None

Raises:

Type Description
ValueError

If mapping is invalid or required parameters are missing.

RuntimeError

If index creation fails.

retrieve(query, filters=None, options=None, **kwargs) async

Semantic search using text query converted to vector.

Usage Example
from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await vector_capability.retrieve(
    query="What is the capital of France?",
    filters=F.eq("metadata.category", "tech"),
    options=QueryOptions(limit=10),
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve(query="What is the capital of France?", filters=filters)

Parameters:

Name Type Description Default
query str

Text query to embed and search for.

required
filters FilterClause | QueryFilter | None

Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Options to apply to the search. Defaults to None.

None
**kwargs Any

Datastore-specific parameters.

{}

Returns:

Type Description
list[Chunk]

list[Chunk]: List of chunks ordered by relevance score.

retrieve_by_vector(vector, filters=None, options=None) async

Direct vector similarity search.

Usage Example
from gllm_datastore.core.filters import filter as F

# Direct FilterClause usage
await vector_capability.retrieve_by_vector(
    vector=[0.1, 0.2, 0.3],
    filters=F.eq("metadata.category", "tech"),
    options=QueryOptions(limit=10),
)

# Multiple filters
filters = F.and_(F.eq("metadata.source", "wikipedia"), F.eq("metadata.category", "tech"))
await vector_capability.retrieve_by_vector(vector=[0.1, 0.2, 0.3], filters=filters)

Parameters:

Name Type Description Default
vector Vector

Query embedding vector.

required
filters FilterClause | QueryFilter | None

Filters to apply to the search. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
options QueryOptions | None

Options to apply to the search. Defaults to None.

None

Returns:

Type Description
list[Chunk]

list[Chunk]: List of chunks ordered by similarity score.

update(update_values, filters=None, **kwargs) async

Update existing records in the datastore.

Parameters:

Name Type Description Default
update_values dict

Values to update.

required
filters FilterClause | QueryFilter | None

Filters to select records to update. FilterClause objects are automatically converted to QueryFilter internally. Defaults to None.

None
**kwargs Any

Datastore-specific parameters.

{}