Fulltext
Elasticsearch implementation of fulltext search and CRUD capability.
References
NONE
ElasticsearchFulltextCapability(index_name, client, query_field='text')
Elasticsearch implementation of FulltextCapability protocol.
This class provides document CRUD operations and flexible querying using Elasticsearch.
Attributes:
| Name | Type | Description |
|---|---|---|
index_name |
str
|
The name of the Elasticsearch index. |
client |
AsyncElasticsearch
|
AsyncElasticsearch client. |
query_field |
str
|
The field name to use for text content. |
Initialize the Elasticsearch fulltext capability.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index_name |
str
|
The name of the Elasticsearch index. |
required |
client |
AsyncElasticsearch
|
The Elasticsearch client. |
required |
query_field |
str
|
The field name to use for text content. Defaults to "text". |
'text'
|
clear()
async
Clear all records from the datastore.
create(data, **kwargs)
async
Create new records in the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data |
Chunk | list[Chunk]
|
Data to create (single item or collection). |
required |
**kwargs |
Any
|
Backend-specific parameters forwarded to Elasticsearch bulk API. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data structure is invalid. |
delete(filters=None)
async
Deletes records from the data store based on filters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters |
QueryFilter | None
|
Filters to select records for deletion. Defaults to None. |
None
|
delete_by_id(id_)
async
Deletes records from the data store based on IDs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
id_ |
str | list[str]
|
ID or list of IDs to delete. |
required |
get_size()
async
Returns the total number of vectors in the index.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
The total number of vectors. |
retrieve(strategy=SupportedQueryMethods.BY_FIELD, query=None, filters=None, options=None, **kwargs)
async
Retrieve records from the datastore using various strategies.
This is the main retrieval method that supports different search strategies: - BY_FIELD: Filter by metadata fields - BM25: Keyword-based search using BM25 algorithm - AUTOCOMPLETE: Autocomplete suggestions - AUTOSUGGEST: Autosuggest functionality - SHINGLES: Shingle-based search
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy |
SupportedQueryMethods
|
The retrieval strategy to use. Defaults to BY_FIELD. |
BY_FIELD
|
query |
str | None
|
The query string for text-based strategies. Required for BM25, AUTOCOMPLETE, AUTOSUGGEST, SHINGLES. |
None
|
filters |
QueryFilter | None
|
Query filters to apply. Defaults to None. |
None
|
options |
QueryOptions | None
|
Query options (sorting, pagination, etc.). Defaults to None. |
None
|
**kwargs |
Any
|
Additional strategy-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk] | list[str]
|
list[Chunk] | list[str]: Query results. Returns list[Chunk] for BY_FIELD and BM25, list[str] for AUTOCOMPLETE, AUTOSUGGEST, and SHINGLES. |
Example
# Field-based retrieval
results = await data_store.retrieve(
strategy=SupportedQueryMethods.BY_FIELD,
filters=QueryFilter(conditions={"category": "AI"})
)
# BM25 keyword search
results = await data_store.retrieve(
strategy=SupportedQueryMethods.BM25,
query="machine learning",
options=QueryOptions(limit=10)
)
# Autocomplete suggestions
suggestions = await data_store.retrieve(
strategy=SupportedQueryMethods.AUTOCOMPLETE,
query="artificial",
field="title"
)
retrieve_autocomplete(query, field, size=20, fuzzy_tolerance=1, min_prefix_length=3, filter_query=None)
async
Provides suggestions based on a prefix query for a specific field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
The query string. |
required |
field |
str
|
The field name for autocomplete. |
required |
size |
int
|
The number of suggestions to retrieve. Defaults to 20. |
20
|
fuzzy_tolerance |
int
|
The level of fuzziness for suggestions. Defaults to 1. |
1
|
min_prefix_length |
int
|
The minimum prefix length to trigger fuzzy matching. Defaults to 3. |
3
|
filter_query |
dict[str, Any] | None
|
The filter query. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of suggestions. |
retrieve_autosuggest(query, search_fields, autocomplete_field, size=20, min_length=3, filter_query=None)
async
Generates suggestions across multiple fields using a multi_match query to broaden the search criteria.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
The query string. |
required |
search_fields |
list[str]
|
The fields to search for. |
required |
autocomplete_field |
str
|
The field name for autocomplete. |
required |
size |
int
|
The number of suggestions to retrieve. Defaults to 20. |
20
|
min_length |
int
|
The minimum length of the query. Defaults to 3. |
3
|
filter_query |
dict[str, Any] | None
|
The filter query. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of suggestions. |
retrieve_bm25(query, filters=None, options=None, k1=None, b=None)
async
Queries the Elasticsearch data store using BM25 algorithm for keyword-based search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
The query string. |
required |
filters |
QueryFilter | None
|
Optional metadata filter to apply to the search.
For example, |
None
|
options |
QueryOptions | None
|
Query options including fields, limit, order_by, etc.
For example, |
None
|
k1 |
float | None
|
BM25 parameter controlling term frequency saturation. Higher values mean term frequency has more impact before diminishing returns. Typical values: 1.2-2.0. If None, uses Elasticsearch default (~1.2). Defaults to None. |
None
|
b |
float | None
|
BM25 parameter controlling document length normalization. 0.0 = no length normalization, 1.0 = full normalization. Typical values: 0.75. If None, uses Elasticsearch default (~0.75). Defaults to None. |
None
|
Example
# Basic BM25 query on the 'text' field
results = await data_store.query_bm25("machine learning")
# BM25 query on specific fields with query options
results = await data_store.query_bm25(
"natural language",
options=QueryOptions(fields=["title", "abstract"], limit=5)
)
# BM25 query with filters
results = await data_store.query_bm25(
"deep learning",
filters=QueryFilter(conditions={"category": "AI", "status": "published"})
)
# BM25 query with custom BM25 parameters for more aggressive term frequency weighting
results = await data_store.query_bm25(
"artificial intelligence",
k1=2.0,
b=0.5
)
# BM25 query with fields, filters, and options
results = await data_store.query_bm25(
"data science applications",
filters=QueryFilter(conditions={"author_id": "user123", "publication_year": [2022, 2023]}),
options=QueryOptions(fields=["content", "tags"], limit=10, order_by="score", order_desc=True),
k1=1.5,
b=0.9
)
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: A list of Chunk objects representing the retrieved documents. |
retrieve_by_field(filters=None, options=None, **kwargs)
async
Retrieve records from the datastore based on metadata field filtering.
This method filters and returns stored chunks based on metadata values rather than text content. It is particularly useful for structured lookups, such as retrieving all chunks from a certain source, tagged with a specific label, or authored by a particular user.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filters |
QueryFilter | None
|
Query filters to apply. Defaults to None. |
None
|
options |
QueryOptions | None
|
Query options (sorting, pagination, etc.). Defaults to None. |
None
|
**kwargs |
Any
|
Backend-specific parameters. |
{}
|
Returns:
| Type | Description |
|---|---|
list[Chunk]
|
list[Chunk]: The filtered results as Chunk objects. |
retrieve_shingles(query, field, size=20, min_length=3, max_length=30, filter_query=None)
async
Searches using shingles for prefix and fuzzy matching.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query |
str
|
The query string. |
required |
field |
str
|
The field name for autocomplete. |
required |
size |
int
|
The number of suggestions to retrieve. Defaults to 20. |
20
|
min_length |
int
|
The minimum length of the query. Queries shorter than this limit will return an empty list. Defaults to 3. |
3
|
max_length |
int
|
The maximum length of the query. Queries exceeding this limit will return an empty list. Defaults to 30. |
30
|
filter_query |
dict[str, Any] | None
|
The filter query. Defaults to None. |
None
|
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: A list of suggestions. |
update(update_values, filters=None)
async
Update existing records in the datastore.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
update_values |
dict
|
Values to update. |
required |
filters |
QueryFilter | None
|
Filters to select records to update. Defaults to None. |
None
|
SupportedQueryMethods
Bases: StrEnum
Supported query methods for Elasticsearch fulltext capability.