Relevance Filter

Modules used to filter chunks based on its relevance to a given query.

`LMBasedRelevanceFilter(lm_request_processor, batch_size=DEFAULT_BATCH_SIZE, on_failure_keep_all=True, metadata=None, chunk_format=DEFAULT_CHUNK_TEMPLATE)`

Bases: BaseRelevanceFilter, UsesLM

Relevance filter that uses an LM to determine chunk relevance.

This filter processes chunks in batches, sending them to an LM for relevance determination. It handles potential LM processing failures with a simple strategy controlled by the 'on_failure_keep_all' parameter.

The LM is expected to return a specific output format for each chunk, indicating its relevance to the given query.

The expected LM output format is:

    {
        "results": [
            {
                "explanation": str,
                "is_relevant": bool
            },
            ...
        ]
    }

The number of items in "results" should match the number of input chunks.

Attributes:

Name	Type	Description
`lm_request_processor`	`LMRequestProcessor`	The LM request processor used for LM calls.
`batch_size`	`int`	The number of chunks to process in each LM call.
`on_failure_keep_all`	`bool`	If True, keep all chunks when LM processing fails. If False, discard all chunks from the failed batch.
`metadata`	`list[str] \| None`	List of metadata fields to include. If None, no metadata is included.
`chunk_format`	`str \| Callable[[Chunk], str]`	Either a format string or a callable for custom chunk formatting. If using a format string: - Use {content} for chunk content - Use {metadata} for auto-formatted metadata block - Or reference metadata fields directly: {field_name}

Initialize the LMBasedRelevanceFilter.

Parameters:

Name	Type	Description	Default
`lm_request_processor`	`LMRequestProcessor`	The LM request processor to use for LM calls.	required
`batch_size`	`int`	The number of chunks to process in each LM call. Defaults to DEFAULT_BATCH_SIZE.	`DEFAULT_BATCH_SIZE`
`on_failure_keep_all`	`bool`	If True, keep all chunks when LM processing fails. If False, discard all chunks from the failed batch. Defaults to True.	`True`
`metadata`	`list[str] \| None`	List of metadata fields to include. If None, no metadata is included.	`None`
`chunk_format`	`str \| Callable[[Chunk], str]`	Either a format string or a callable for custom chunk formatting. If using a format string: - Use {content} for chunk content - Use {metadata} for auto-formatted metadata block - Or reference metadata fields directly: {field_name} Defaults to DEFAULT_CHUNK_TEMPLATE.	`DEFAULT_CHUNK_TEMPLATE`

`filter(chunks, query)` `async`

Filter the given chunks based on their relevance to the query using an LM.

This method processes chunks in batches, sending each batch to the LM for relevance determination. If LM processing fails for a batch, the behavior is determined by the 'on_failure_keep_all' attribute.

Parameters:

Name	Type	Description	Default
`chunks`	`list[Chunk]`	The list of chunks to filter.	required
`query`	`str`	The query to compare chunks against.	required

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: A list of chunks deemed relevant by the LM.

`SimilarityBasedRelevanceFilter(em_invoker, threshold=0.5)`

Bases: BaseRelevanceFilter

Relevance filter that uses semantic similarity to determine chunk relevance.

Attributes:

Name	Type	Description
`em_invoker`	`BaseEMInvoker`	The embedding model invoker to use for vectorization.
`threshold`	`float`	The similarity threshold for relevance (0 to 1). Defaults to 0.5.

Initialize the SimilarityBasedRelevanceFilter.

Parameters:

Name	Type	Description	Default
`em_invoker`	`BaseEMInvoker`	The embedding model invoker to use for vectorization.	required
`threshold`	`float`	The similarity threshold for relevance (0 to 1). Defaults to 0.5.	`0.5`

Raises:

Type	Description
`ValueError`	If the threshold is not between 0 and 1.

`filter(chunks, query)` `async`

Filter the given chunks based on their semantic similarity to the query.

This method calculates the similarity between the query and each text chunk. For now, non-text chunks are excluded from processing and similarity calculation.

Parameters:

Name	Type	Description	Default
`chunks`	`list[Chunk]`	The list of chunks to filter.	required
`query`	`str`	The query to compare chunks against.	required

Returns:

Type	Description
`list[Chunk]`	list[Chunk]: A list of relevant text chunks. Non-text chunks are not included in the result.

Relevance Filter

LMBasedRelevanceFilter(lm_request_processor, batch_size=DEFAULT_BATCH_SIZE, on_failure_keep_all=True, metadata=None, chunk_format=DEFAULT_CHUNK_TEMPLATE)

filter(chunks, query) async

SimilarityBasedRelevanceFilter(em_invoker, threshold=0.5)

filter(chunks, query) async

`LMBasedRelevanceFilter(lm_request_processor, batch_size=DEFAULT_BATCH_SIZE, on_failure_keep_all=True, metadata=None, chunk_format=DEFAULT_CHUNK_TEMPLATE)`

`filter(chunks, query)` `async`

`SimilarityBasedRelevanceFilter(em_invoker, threshold=0.5)`

`filter(chunks, query)` `async`