Skip to content

Analyzer

BaseTextAnalyzer class to analyze the text and extract the PII entities.

BaseTextAnalyzer

Bases: Component, ABC

Analyzer class to analyze the text and extract the PII entities.

analyze(text, language, entities=None, score_threshold=None, allow_list=None, allow_list_match='exact', regex_flags=re.DOTALL | re.MULTILINE | re.IGNORECASE, **kwargs) abstractmethod

Analyze the text and extract the PII entities.

Parameters:

Name Type Description Default
text str

The text to be analyzed

required
language str

The language of the text

required
entities list | None

The list of entities to be extracted. Default is None.

None
score_threshold float | None

The threshold score for the extracted entities. Default is None.

None
allow_list list | None

List of words that the user defines as being allowed to keep in the text. Default is None.

None
allow_list_match str | None

The matching strategy for the allow list. Default is "exact".

'exact'
regex_flags int | None

The regex flags for the text analysis.

DOTALL | MULTILINE | IGNORECASE
**kwargs Any

Additional keyword arguments that may be needed for the text analysis process.

{}

Returns:

Type Description
list[RecognizerResult]

list[RecognizerResult]: The list of extracted entities

execute(**kwargs) async

Execute the analysis process.

This method is called internally to perform the text analysis operation.

Parameters:

Name Type Description Default
**kwargs Any

Additional keyword arguments that may be needed for the text analysis process.

{}

Returns:

Type Description
list[RecognizerResult]

list[RecognizerResult]: The list of extracted RecognizerResult entities.

Raises:

Type Description
ValueError

If the input kwargs does not contain the required keys or the keys have invalid values.