Reference Formatter
Modules concerning the reference formatters used in Gen AI applications.
LMBasedReferenceFormatter(lm_request_processor, batch_size=DEFAULT_BATCH_SIZE, reference_metadata=DEFAULT_REFERENCE_METADATA, stringify=True, format_references_func=None, format_chunk_func=None, format_chunk_metadata_map=None, streamable=True)
Bases: BaseReferenceFormatter
, UsesLM
A reference formatter that utilizes a language model to filter the candidate chunks.
This class extends the BaseReferenceFormatter and uses a language model to process and filter a list of
candidate chunks. The language model filters the chunks based on the content and selected metadata of the chunks.
It then formats the relevant chunks into a reference string or a list of chunks depending on the stringify
attribute.
Attributes:
Name | Type | Description |
---|---|---|
lm_request_processor |
LMRequestProcessor
|
The request processor used to handle the candidate chunks filtering. |
batch_size |
int
|
The number of chunks to process in each batch. |
stringify |
bool
|
Whether to format the references as a string. If False, the references will be returned as a list of chunks. |
format_references_func |
Callable[[list[str]], str]
|
A function that formats a list of formatted chunks as a string of references. |
format_chunk_func |
Callable[[Chunk], str]
|
A function that formats a chunk as a reference. |
streamable |
bool
|
A flag to indicate whether the formatted references will be streamed if an event emitter is provided. |
LM request processor configuration
The LM request processor for the LMBasedReferenceFormatter
must be configured as follows:
Input
The LM request processor must take the following variables as its input:
1. response
: The synthesized response.
2. context
: The formatted context string of the candidate chunks in the following format:
```
<Chunk>
file_name: <file_name_2>
content:
<content_2>
</Chunk>
```
The file_name
is the metadata of the chunk that will be referenced by the language model in the
filtering process, and can be customized via the reference_metadata
parameter.
Output
The LM request processor must output a list of reference metadata of the chunks that are deemed as relevant to
the synthesized response. This can be achieved by either using the JSONOutputParser
or by utilizing the
structured_output
feature of the LMOutput
object.
Using JSONOutputParser
Usage example:
lm_request_processor = build_lm_request_processor(
model="...",
system_template="...",
output_parser=JSONOutputParser()
)
Output example:
{
"file_name": ["<file_name_1>", "<file_name_2>"]
}
Once again, the file_name
can be customized via the reference_metadata
parameter.
Using structured_output
Usage example:
class RelevantChunks(BaseModel):
file_name: list[str]
lm_request_processor = build_lm_request_processor(
model="...",
system_template="...",
config={"response_schema": RelevantChunks}
)
Output example:
LMOutput(structured_output=RelevantChunks(file_name=["<file_name_1>", "<file_name_2>"]), ...)
Once again, the file_name
can be customized via the reference_metadata
parameter.
Initializes a new instance of the LMBasedReferenceFormatter class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lm_request_processor |
LMRequestProcessor
|
The request processor used to handle the candidate chunks filtering. |
required |
batch_size |
int
|
The number of chunks to process in each batch. Defaults to DEFAULT_BATCH_SIZE. |
DEFAULT_BATCH_SIZE
|
reference_metadata |
str
|
The metadata of the chunk to be referenced by the language model in the filtering process. Defaults to DEFAULT_REFERENCE_METADATA. |
DEFAULT_REFERENCE_METADATA
|
stringify |
bool
|
Whether to format the references as a string. If False, the references will be returned as a list of chunks. Defaults to True. |
True
|
format_references_func |
Callable[[list[str]], str] | None
|
A function that formats a list of
formatted chunks as a string of references. Defaults to None, in which case
|
None
|
format_chunk_func |
Callable[[Chunk], str] | None
|
A function that formats a chunk as a reference.
Defaults to None, in which case |
None
|
format_chunk_metadata_map |
dict[str, str] | None
|
A dictionary mapping the metadata keys needed
by the |
None
|
streamable |
bool
|
A flag to indicate whether the formatted references will be streamed if an event emitter is provided. Defaults to True. |
True
|
SimilarityBasedReferenceFormatter(em_invoker, threshold=DEFAULT_THRESHOLD, stringify=True, format_references_func=None, format_chunk_func=None, format_chunk_metadata_map=None, streamable=True)
Bases: BaseReferenceFormatter
A reference formatter that filters the candidate chunks based on their relevance to the synthesized response.
This class extends the BaseReferenceFormatter and uses an embedding model to compute the similarity between
the synthesized response and the chunk contents. It filters out chunks whose similarity score falls below
a specified threshold. Then, it formats the relevant chunks into a reference string or a list of chunks
depending on the stringify
attribute.
Attributes:
Name | Type | Description |
---|---|---|
em_invoker |
BaseEMInvoker
|
The embedding model invoker for generating embeddings for the response and chunk contents. |
threshold |
float
|
The similarity threshold for filtering chunks. |
stringify |
bool
|
Whether to format the references as a string. If False, the references will be returned as a list of chunks. |
format_references_func |
Callable[[list[str]], str]
|
A function that formats a list of formatted chunks as a string of references. |
format_chunk_func |
Callable[[Chunk], str]
|
A function that formats a chunk as a reference. |
streamable |
bool
|
A flag to indicate whether the formatted references will be streamed if an event emitter is provided. |
Initializes a new instance of the SimilarityBasedReferenceFormatter class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
em_invoker |
BaseEMInvoker
|
The embedding model invoker for generating embeddings for the response and chunk contents. |
required |
threshold |
float
|
The similarity threshold for filtering chunks. Defaults to DEFAULT_THRESHOLD. |
DEFAULT_THRESHOLD
|
stringify |
bool
|
Whether to format the references as a string. If False, the references will be returned as a list of chunks. Defaults to True. |
True
|
format_references_func |
Callable[[list[str]], str] | None
|
A function that formats a list of
formatted chunks as a string of references. Defaults to None, in which case
|
None
|
format_chunk_func |
Callable[[Chunk], str] | None
|
A function that formats a chunk as a reference.
Defaults to None, in which case |
None
|
format_chunk_metadata_map |
dict[str, str] | None
|
A dictionary mapping the metadata keys needed
by the |
None
|
streamable |
bool
|
A flag to indicate whether the formatted references will be streamed if an event emitter is provided. Defaults to True. |
True
|