Skip to content

Deanonymizer mapping

Deanonymizer mapping utilities.

This module provides utilities to create and update a mapping used to deanonymize a text. The mapping is used to replace anonymized entities with their original values.

The DeanonymizerMapping class is used to store the mapping, while the create_anonymizer_mapping function is used to create or update the mapping based on the results of the analysis and anonymization processes.

Authors

Muhammad Hakim Asy'ari (muhammad.h.asyari@gdplabs.id) Immanuel Rhesa (immanuel.rhesa@gdplabs.id)

References
  1. https://python.langchain.com/v0.1/docs/guides/productionization/safety/presidio_data_anonymization/

DeanonymizerMapping(mapping=lambda: defaultdict(lambda: defaultdict(str))(), skip_format_duplicates=False) dataclass

Class to store the deanonymizer mapping.

The mapping is used to replace anonymized entities with their original values.

Attributes:

Name Type Description
mapping MappingDataType

The deanonymizer mapping.

data MappingDataType

The current deanonymizer mapping.

skip_format_duplicates bool

Whether to skip formatting duplicated operators.

data: MappingDataType property

Return the deanonymizer mapping.

Returns:

Name Type Description
MappingDataType MappingDataType

The current deanonymizer mapping.

update(new_mapping)

Update the deanonymizer mapping with new values.

Duplicated values will not be added. If there are multiple entities of the same type, the mapping will include a count to differentiate them. For example, if there are two names in the input text, the mapping will include NAME_1 and NAME_2.

Parameters:

Name Type Description Default
new_mapping MappingDataType

The new mapping to be added to the existing deanonymizer mapping.

required

create_anonymizer_mapping(original_text, analyzer_results, anonymizer_results, is_reversed=False, skip_format_duplicates=False)

Create or update the mapping used to anonymize and/or deanonymize a text.

This method exploits the results returned by the analysis and anonymization processes and ensures case-insensitive matching during mapping creation.

Parameters:

Name Type Description Default
original_text str

The original text to be anonymized or deanonymized.

required
analyzer_results list[RecognizerResult]

The results from the text analysis.

required
anonymizer_results EngineResult

The results from the text anonymization.

required
is_reversed bool

If True, constructs a mapping from each original entity to its anonymized value. If False, constructs a mapping from each anonymized entity back to its original text value. Defaults to False.

False
skip_format_duplicates bool

If True, skip formatting duplicated operators. Defaults to False.

False

Returns:

Name Type Description
MappingDataType MappingDataType

The mapping used to anonymize and/or deanonymize the text. This mapping is constructed with case-insensitive checks but preserves the original case in the stored values.

Example

{ "PERSON": { "": "", "John Doe": "Slim Shady" }, "PHONE_NUMBER": { "111-111-1111": "555-555-5555" } ... }

format_duplicated_operator(operator_name, count)

Format the operator name with the count.

Parameters:

Name Type Description Default
operator_name str

The operator name.

required
count int

The count of the operator.

required

get_dict_diff(first_mapping, second_mapping)

Compare two dictionaries and generate the differences.

This function compares two nested dictionaries and identifies the differences between them. If a key-value pair in first_mapping is different from second_mapping, the function records the difference and stores it as a dictionary containing the PII type, anonymized value, and the corresponding PII value.

Parameters:

Name Type Description Default
first_mapping MappingDataType

The original dictionary to compare.

required
second_mapping MappingDataType

The new dictionary to compare against the original.

required

Returns:

Type Description
list[dict[str, Any]]

list[dict[str, Any]]: A list of dictionaries representing the differences, with keys "pii_type", "anonymized_value", and "pii_value".