Skip to content

Base

Base PII Resolver interface for handling Personally Identifiable Information.

This module provides the abstract base class for PII resolvers that handle anonymization and deanonymization of queries and document chunks.

Authors

Dimitrij Ray (dimitrij.ray@gdplabs.id)

References

NONE

AnonymizationResult(anonymized_query, pii_entities) dataclass

Result of processing a query for PII anonymization.

Contains the anonymized query and the list of PII entities that were identified and processed.

Attributes:

Name Type Description
anonymized_query str

Query with PII replaced by tokens.

pii_entities list[str]

List of PII entities that were identified.

BasePIIResolver

Bases: ABC

Abstract base class for PII resolvers that handle anonymization and deanonymization.

Responsible for handling PII resolution in queries and chunks. It is configurable to support raw queries (requiring anonymization) and pre-masked queries (requiring entity extraction).

deanonymize_chunk(chunk) abstractmethod async

Restore PII in the chunk content.

Parameters:

Name Type Description Default
chunk Chunk

The chunk containing anonymized text. Expected PII mapping format in metadata is : entity (e.g., {"": "Alice", "": "alice@company.com"}).

required

Returns:

Name Type Description
Chunk Chunk

The chunk with de-anonymized content.

Raises:

Type Description
NotImplementedError

If the method is not implemented by the subclass.

process_query(query, pii_map, is_masked=False) abstractmethod async

Process the query to ensure it is anonymized and return relevant PII entities.

Parameters:

Name Type Description Default
query str

The user query to be anonymized.

required
pii_map dict[str, str]

A dictionary mapping PII entities to tokens with format entity: (e.g., {"Alice": "", "alice@company.com": ""}).

required
is_masked bool

Flag indicating if the query is already masked. If True, the method acts as a pass-through. Defaults to False.

False

Returns:

Name Type Description
AnonymizationResult AnonymizationResult

A result containing: 1. anonymized_query (str): Query with PII replaced (e.g., "What did say?"), or the original query if is_masked is True. 2. pii_entities (list[str]): List of PII entities from the provided map (e.g., ["Alice", "alice@company.com"]), or empty list if is_masked is True.

Raises:

Type Description
NotImplementedError

If the method is not implemented by the subclass.