Fulltext

Elasticsearch implementation of fulltext search and CRUD capability.

Authors

Kadek Denaya (kadek.d.r.diana@gdplabs.id)

References

NONE

`ElasticsearchFulltextCapability(index_name, client, query_field='text')`

Elasticsearch implementation of FulltextCapability protocol.

This class provides document CRUD operations and flexible querying using Elasticsearch.

Attributes:

Name	Type	Description
`index_name`	`str`	The name of the Elasticsearch index.
`client`	`AsyncElasticsearch`	AsyncElasticsearch client.
`query_field`	`str`	The field name to use for text content.

Initialize the Elasticsearch fulltext capability.

Parameters:

Name	Type	Description	Default
`index_name`	`str`	The name of the Elasticsearch index.	required
`client`	`AsyncElasticsearch`	The Elasticsearch client.	required
`query_field`	`str`	The field name to use for text content. Defaults to "text".	`'text'`

`clear()` `async`

Clear all records from the datastore.

`create(data, **kwargs)` `async`

Create new records in the datastore.

Parameters:

Name	Type	Description	Default
`data`	`Chunk \| list[Chunk]`	Data to create (single item or collection).	required
`**kwargs`	`Any`	Backend-specific parameters forwarded to Elasticsearch bulk API.	`{}`

Raises:

Type	Description
`ValueError`	If data structure is invalid.

`delete(filters=None)` `async`

Deletes records from the data store based on filters.

Parameters:

Name	Type	Description	Default
`filters`	`QueryFilter \| None`	Filters to select records for deletion. Defaults to None.	`None`

`delete_by_id(id_)` `async`

Deletes records from the data store based on IDs.

Parameters:

Name	Type	Description	Default
`id_`	`str \| list[str]`	ID or list of IDs to delete.	required

`get_size()` `async`

Returns the total number of vectors in the index.

Returns:

Name	Type	Description
`int`	`int`	The total number of vectors.

`retrieve(strategy=SupportedQueryMethods.BY_FIELD, query=None, filters=None, options=None, **kwargs)` `async`

Retrieve records from the datastore using various strategies.

This is the main retrieval method that supports different search strategies: - BY_FIELD: Filter by metadata fields - BM25: Keyword-based search using BM25 algorithm - AUTOCOMPLETE: Autocomplete suggestions - AUTOSUGGEST: Autosuggest functionality - SHINGLES: Shingle-based search

Parameters:

Name	Type	Description	Default
`strategy`	`SupportedQueryMethods`	The retrieval strategy to use. Defaults to BY_FIELD.	`BY_FIELD`
`query`	`str \| None`	The query string for text-based strategies. Required for BM25, AUTOCOMPLETE, AUTOSUGGEST, SHINGLES.	`None`
`filters`	`QueryFilter \| None`	Query filters to apply. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Query options (sorting, pagination, etc.). Defaults to None.	`None`
`**kwargs`	`Any`	Additional strategy-specific parameters.	`{}`

Returns:

Type	Description
`list[Chunk] \| list[str]`	list[Chunk] \| list[str]: Query results. Returns list[Chunk] for BY_FIELD and BM25, list[str] for AUTOCOMPLETE, AUTOSUGGEST, and SHINGLES.

Example

# Field-based retrieval
results = await data_store.retrieve(
    strategy=SupportedQueryMethods.BY_FIELD,
    filters=QueryFilter(conditions={"category": "AI"})
)

# BM25 keyword search
results = await data_store.retrieve(
    strategy=SupportedQueryMethods.BM25,
    query="machine learning",
    options=QueryOptions(limit=10)
)

# Autocomplete suggestions
suggestions = await data_store.retrieve(
    strategy=SupportedQueryMethods.AUTOCOMPLETE,
    query="artificial",
    field="title"
)

`retrieve_autocomplete(query, field, size=20, fuzzy_tolerance=1, min_prefix_length=3, filter_query=None)` `async`

Provides suggestions based on a prefix query for a specific field.

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string.	required
`field`	`str`	The field name for autocomplete.	required
`size`	`int`	The number of suggestions to retrieve. Defaults to 20.	`20`
`fuzzy_tolerance`	`int`	The level of fuzziness for suggestions. Defaults to 1.	`1`
`min_prefix_length`	`int`	The minimum prefix length to trigger fuzzy matching. Defaults to 3.	`3`
`filter_query`	`dict[str, Any] \| None`	The filter query. Defaults to None.	`None`

Returns:

Type	Description
`list[str]`	list[str]: A list of suggestions.

`retrieve_autosuggest(query, search_fields, autocomplete_field, size=20, min_length=3, filter_query=None)` `async`

Generates suggestions across multiple fields using a multi_match query to broaden the search criteria.

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string.	required
`search_fields`	`list[str]`	The fields to search for.	required
`autocomplete_field`	`str`	The field name for autocomplete.	required
`size`	`int`	The number of suggestions to retrieve. Defaults to 20.	`20`
`min_length`	`int`	The minimum length of the query. Defaults to 3.	`3`
`filter_query`	`dict[str, Any] \| None`	The filter query. Defaults to None.	`None`

Returns:

Type	Description
`list[str]`	list[str]: A list of suggestions.

`retrieve_bm25(query, filters=None, options=None, k1=None, b=None)` `async`

Queries the Elasticsearch data store using BM25 algorithm for keyword-based search.

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string.	required
`filters`	`QueryFilter \| None`	Optional metadata filter to apply to the search. For example, `QueryFilter(conditions={"category": "AI", "source": ["doc1", "doc2"]})`. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Query options including fields, limit, order_by, etc. For example, `QueryOptions(fields=["title", "content"], limit=10, order_by="score", order_desc=True)`. If fields is None, defaults to ["text"]. For multiple fields, uses multi_match query. Defaults to None.	`None`
`k1`	`float \| None`	BM25 parameter controlling term frequency saturation. Higher values mean term frequency has more impact before diminishing returns. Typical values: 1.2-2.0. If None, uses Elasticsearch default (~1.2). Defaults to None.	`None`
`b`	`float \| None`	BM25 parameter controlling document length normalization. 0.0 = no length normalization, 1.0 = full normalization. Typical values: 0.75. If None, uses Elasticsearch default (~0.75). Defaults to None.	`None`

Example

# Basic BM25 query on the 'text' field
results = await data_store.query_bm25("machine learning")

# BM25 query on specific fields with query options
results = await data_store.query_bm25(
    "natural language",
    options=QueryOptions(fields=["title", "abstract"], limit=5)
)

# BM25 query with filters
results = await data_store.query_bm25(
    "deep learning",
    filters=QueryFilter(conditions={"category": "AI", "status": "published"})
)

# BM25 query with custom BM25 parameters for more aggressive term frequency weighting
results = await data_store.query_bm25(
    "artificial intelligence",
    k1=2.0,
    b=0.5
)

# BM25 query with fields, filters, and options
results = await data_store.query_bm25(
    "data science applications",
    filters=QueryFilter(conditions={"author_id": "user123", "publication_year": [2022, 2023]}),
    options=QueryOptions(fields=["content", "tags"], limit=10, order_by="score", order_desc=True),
    k1=1.5,
    b=0.9
)

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: A list of Chunk objects representing the retrieved documents.

`retrieve_by_field(filters=None, options=None, **kwargs)` `async`

Retrieve records from the datastore based on metadata field filtering.

This method filters and returns stored chunks based on metadata values rather than text content. It is particularly useful for structured lookups, such as retrieving all chunks from a certain source, tagged with a specific label, or authored by a particular user.

Parameters:

Name	Type	Description	Default
`filters`	`QueryFilter \| None`	Query filters to apply. Defaults to None.	`None`
`options`	`QueryOptions \| None`	Query options (sorting, pagination, etc.). Defaults to None.	`None`
`**kwargs`	`Any`	Backend-specific parameters.	`{}`

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: The filtered results as Chunk objects.

`retrieve_shingles(query, field, size=20, min_length=3, max_length=30, filter_query=None)` `async`

Searches using shingles for prefix and fuzzy matching.

Parameters:

Name	Type	Description	Default
`query`	`str`	The query string.	required
`field`	`str`	The field name for autocomplete.	required
`size`	`int`	The number of suggestions to retrieve. Defaults to 20.	`20`
`min_length`	`int`	The minimum length of the query. Queries shorter than this limit will return an empty list. Defaults to 3.	`3`
`max_length`	`int`	The maximum length of the query. Queries exceeding this limit will return an empty list. Defaults to 30.	`30`
`filter_query`	`dict[str, Any] \| None`	The filter query. Defaults to None.	`None`

Returns:

Type	Description
`list[str]`	list[str]: A list of suggestions.

`update(update_values, filters=None)` `async`

Update existing records in the datastore.

Parameters:

Name	Type	Description	Default
`update_values`	`dict`	Values to update.	required
`filters`	`QueryFilter \| None`	Filters to select records to update. Defaults to None.	`None`

`SupportedQueryMethods`

Bases: StrEnum

Supported query methods for Elasticsearch fulltext capability.

Fulltext

ElasticsearchFulltextCapability(index_name, client, query_field='text')

clear() async

create(data, **kwargs) async

delete(filters=None) async

delete_by_id(id_) async

get_size() async

retrieve(strategy=SupportedQueryMethods.BY_FIELD, query=None, filters=None, options=None, **kwargs) async

retrieve_autocomplete(query, field, size=20, fuzzy_tolerance=1, min_prefix_length=3, filter_query=None) async

retrieve_autosuggest(query, search_fields, autocomplete_field, size=20, min_length=3, filter_query=None) async

retrieve_bm25(query, filters=None, options=None, k1=None, b=None) async

retrieve_by_field(filters=None, options=None, **kwargs) async

retrieve_shingles(query, field, size=20, min_length=3, max_length=30, filter_query=None) async

update(update_values, filters=None) async

SupportedQueryMethods

`ElasticsearchFulltextCapability(index_name, client, query_field='text')`

`clear()` `async`

`create(data, **kwargs)` `async`

`delete(filters=None)` `async`

`delete_by_id(id_)` `async`

`get_size()` `async`

`retrieve(strategy=SupportedQueryMethods.BY_FIELD, query=None, filters=None, options=None, **kwargs)` `async`

`retrieve_autocomplete(query, field, size=20, fuzzy_tolerance=1, min_prefix_length=3, filter_query=None)` `async`

`retrieve_autosuggest(query, search_fields, autocomplete_field, size=20, min_length=3, filter_query=None)` `async`

`retrieve_bm25(query, filters=None, options=None, k1=None, b=None)` `async`

`retrieve_by_field(filters=None, options=None, **kwargs)` `async`

`retrieve_shingles(query, field, size=20, min_length=3, max_length=30, filter_query=None)` `async`

`update(update_values, filters=None)` `async`

`SupportedQueryMethods`