Deanonymizer mapping
Deanonymizer mapping utilities.
This module provides utilities to create and update a mapping used to deanonymize a text. The mapping is used to replace anonymized entities with their original values.
The DeanonymizerMapping class is used to store the mapping, while the create_anonymizer_mapping function is used to create or update the mapping based on the results of the analysis and anonymization processes.
References
- https://python.langchain.com/v0.1/docs/guides/productionization/safety/presidio_data_anonymization/
DeanonymizerMapping(mapping=lambda: defaultdict(lambda: defaultdict(str))(), skip_format_duplicates=False)
dataclass
Class to store the deanonymizer mapping.
The mapping is used to replace anonymized entities with their original values.
Attributes:
Name | Type | Description |
---|---|---|
mapping |
MappingDataType
|
The deanonymizer mapping. |
data |
MappingDataType
|
The current deanonymizer mapping. |
skip_format_duplicates |
bool
|
Whether to skip formatting duplicated operators. |
data: MappingDataType
property
Return the deanonymizer mapping.
Returns:
Name | Type | Description |
---|---|---|
MappingDataType |
MappingDataType
|
The current deanonymizer mapping. |
update(new_mapping)
Update the deanonymizer mapping with new values.
Duplicated values will not be added. If there are multiple entities of the same type, the mapping will include a count to differentiate them. For example, if there are two names in the input text, the mapping will include NAME_1 and NAME_2.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
new_mapping |
MappingDataType
|
The new mapping to be added to the existing deanonymizer mapping. |
required |
create_anonymizer_mapping(original_text, analyzer_results, anonymizer_results, is_reversed=False, skip_format_duplicates=False)
Create or update the mapping used to anonymize and/or deanonymize a text.
This method exploits the results returned by the analysis and anonymization processes and ensures case-insensitive matching during mapping creation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
original_text |
str
|
The original text to be anonymized or deanonymized. |
required |
analyzer_results |
list[RecognizerResult]
|
The results from the text analysis. |
required |
anonymizer_results |
EngineResult
|
The results from the text anonymization. |
required |
is_reversed |
bool
|
If True, constructs a mapping from each original entity to its anonymized value. If False, constructs a mapping from each anonymized entity back to its original text value. Defaults to False. |
False
|
skip_format_duplicates |
bool
|
If True, skip formatting duplicated operators. Defaults to False. |
False
|
Returns:
Name | Type | Description |
---|---|---|
MappingDataType |
MappingDataType
|
The mapping used to anonymize and/or deanonymize the text. This mapping is constructed with case-insensitive checks but preserves the original case in the stored values. |
Example
{
"PERSON": {
"
format_duplicated_operator(operator_name, count)
Format the operator name with the count.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
operator_name |
str
|
The operator name. |
required |
count |
int
|
The count of the operator. |
required |
get_dict_diff(first_mapping, second_mapping)
Compare two dictionaries and generate the differences.
This function compares two nested dictionaries and identifies the differences between them. If a key-value pair
in first_mapping
is different from second_mapping
, the function records the difference and stores it as a
dictionary containing the PII type, anonymized value, and the corresponding PII value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
first_mapping |
MappingDataType
|
The original dictionary to compare. |
required |
second_mapping |
MappingDataType
|
The new dictionary to compare against the original. |
required |
Returns:
Type | Description |
---|---|
list[dict[str, Any]]
|
list[dict[str, Any]]: A list of dictionaries representing the differences, with keys "pii_type", "anonymized_value", and "pii_value". |