Fulltext
Elasticsearch implementation of fulltext search and CRUD capability.
ElasticsearchFulltextCapability(index_name, client, query_field='text', encryption=None, default_batch_size=None)
Bases: ElasticLikeCore, BaseFulltextCapability
Elasticsearch implementation of FulltextCapability protocol.
This class provides document CRUD operations and flexible querying using Elasticsearch.
Attributes:
| Name | Type | Description |
|---|---|---|
index_name |
str
|
The name of the Elasticsearch index. |
client |
AsyncElasticsearch
|
AsyncElasticsearch client. |
query_field |
str
|
The field name to use for text content. |
Initialize the Elasticsearch fulltext capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_name
|
str
|
The name of the Elasticsearch index. |
required |
client
|
AsyncElasticsearch
|
The Elasticsearch client. |
required |
query_field
|
str
|
The field name to use for text content. Defaults to "text". |
'text'
|
encryption
|
EncryptionCapability | None
|
Encryption capability for field-level encryption. Defaults to None. |
None
|
default_batch_size
|
int | None
|
Default batch size for operations. Defaults to None. |
None
|
delete_by_id(id_)
async
Deletes records from the data store based on IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id_
|
str | list[str]
|
ID or list of IDs to delete. |
required |
get_size()
async
Returns the total number of documents in the index.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The total number of documents. |
retrieve(strategy=SupportedQueryMethods.BY_FIELD, query=None, filters=None, options=None, **kwargs)
async
retrieve(strategy: Literal[SupportedQueryMethods.BY_FIELD] | None = SupportedQueryMethods.BY_FIELD, query: str | None = None, filters: FilterClause | QueryFilter | None = None, options: QueryOptions | None = None, **kwargs: Any) -> list[Chunk]
retrieve(strategy: Literal[SupportedQueryMethods.BM25], query: str, filters: FilterClause | QueryFilter | None = None, options: QueryOptions | None = None, k1: float | None = None, b: float | None = None, **kwargs: Any) -> list[Chunk]
retrieve(strategy: Literal[SupportedQueryMethods.AUTOCOMPLETE], query: str, field: str, size: int = 20, fuzzy_tolerance: int = 1, min_prefix_length: int = 3, filter_query: dict[str, Any] | None = None, **kwargs: Any) -> list[str]
retrieve(strategy: Literal[SupportedQueryMethods.AUTOSUGGEST], query: str, search_fields: list[str], autocomplete_field: str, size: int = 20, min_length: int = 3, filter_query: dict[str, Any] | None = None, **kwargs: Any) -> list[str]
retrieve(strategy: Literal[SupportedQueryMethods.SHINGLES], query: str, field: str, size: int = 20, min_length: int = 3, max_length: int = 30, filter_query: dict[str, Any] | None = None, **kwargs: Any) -> list[str]
Public implementation for retrieving records from the datastore.
This method supports multiple retrieval strategies including fulltext search (BM25), metadata filtering (BY_FIELD), and autocomplete/autosuggest features. It handles validation and dispatching to the appropriate internal method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
SupportedQueryMethods
|
The retrieval strategy to use. Defaults to BY_FIELD. |
BY_FIELD
|
query
|
str | None
|
The query string for text-based strategies. Required for strategies other than BY_FIELD. |
None
|
filters
|
FilterClause | QueryFilter | None
|
Query filters to restrict the search results. |
None
|
options
|
QueryOptions | None
|
Query options for sorting, pagination, and field selection. |
None
|
**kwargs
|
Any
|
Additional strategy-specific parameters passed to the implementation. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk] | list[str]
|
list[Chunk] | list[str]: A list of Chunks for document retrieval strategies, or a list of strings for suggestion strategies (autocomplete, autosuggest, shingles). |
Examples:
from gllm_datastore.core.filters import filter as F
# Field-based retrieval with direct FilterClause
results = await data_store.retrieve(
strategy=SupportedQueryMethods.BY_FIELD,
filters=F.eq("metadata.category", "AI")
)
# BM25 keyword search
results = await data_store.retrieve(
strategy=SupportedQueryMethods.BM25,
query="machine learning",
options=QueryOptions(limit=10)
)
# Autocomplete suggestions
suggestions = await data_store.retrieve(
strategy=SupportedQueryMethods.AUTOCOMPLETE,
query="artificial",
field="title"
)
Raises:
| Type | Description |
|---|---|
ValueError
|
If the query is missing for strategies that require it, or if an unsupported strategy is provided. |
SupportedQueryMethods
Bases: StrEnum
Supported query methods for Elasticsearch fulltext capability.