Skip to content

Analyzer

BaseTextAnalyzer class to analyze the text and extract the PII entities.

Authors

Muhammad Afif Al Hawari (muhammad.a.a.hawari@gdplabs.id)

References

NONE

BaseTextAnalyzer

Bases: Component, ABC

Analyzer class to analyze the text and extract the PII entities.

analyze(text, language, entities=None, score_threshold=None, allow_list=None, allow_list_match='exact', regex_flags=re.DOTALL | re.MULTILINE | re.IGNORECASE, **kwargs) abstractmethod

Analyze the text and extract the PII entities.

Parameters:

Name Type Description Default
text str

The text to be analyzed

required
language str

The language of the text

required
entities list | None

The list of entities to be extracted. Default is None.

None
score_threshold float | None

The threshold score for the extracted entities. Default is None.

None
allow_list list | None

List of words that the user defines as being allowed to keep in the text. Default is None.

None
allow_list_match str | None

The matching strategy for the allow list. Default is "exact".

'exact'
regex_flags int | None

The regex flags for the text analysis.

DOTALL | MULTILINE | IGNORECASE
**kwargs Any

Additional keyword arguments that may be needed for the text analysis process.

{}

Returns:

Type Description
list[RecognizerResult]

list[RecognizerResult]: The list of extracted entities