Anonymization Architecture

Architecture of the Anonymization feature.

Protegrity Anonymization allows processing of the datasets through generalization, to ensure the risk of re-identification is within tolerable thresholds. The anonymization process will have an impact on data utility, but Protegrity Anonymization optimizes this fundamental privacy-utility trade-off to ensure maximum data quality within the privacy goals.

Protegrity Anonymization leverages Kubernetes for data anonymization at scale and it provides instructions and support for deployment and usage on AWS EKS and Microsoft Azure AKS.

An overview of the communication is shown in the following figure.

Anonymization Components

Architecture

Protegrity Anonymization uses several pods on Kubernetes. The Protegrity Anonymization Web Server processes requests and stores the data securely in an internal Database Server. The Protegrity Anonymization request is received by the Nginx-Ingress component. Ingress forwards the request to the Anon-App. The Anon-App processes the request and submits the tasks to the cluster. The scheduler schedules tasks on the workers. The Anon-app stores the metadata about the job in the Anon-DB container. Next, the workers read, write, and process the data that is stored in the Anon-Storage, the request stream, or the Cloud storage. The Anon-Storage uses S3 bucket for storing data. The communication between the scheduler and the workers is handled by the scheduler. The workers run on random ports.

The user accesses Protegrity Anonymization using HTTPS over port 443. The user requests are directed to an Ingress Controller, and the controller in turn communicates with the required pods using the following ports:

8090: Ingress controller and the Protegrity Anonymization API Web Service
8786: Ingress controller
8100: Ingress controller and S3 bucket

Protegrity Anonymization leverages Kubernetes for data anonymization at scale and it provides instructions and support for deployment and usage on AWS EKS and Microsoft Azure AKS.

Components

Protegrity Anonymization is composed of the following main components:

Protegrity Anonymization REST Server: This core component exposes a REST interface through which clients can interact with the Protegrity Anonymization service. It uses an in-memory task queue and stores anonymized datasets and respective metadata on persistent storage. Protegrity Anonymization tasks are submitted to a queue and are handled in first-in first out fashion.

Note: Only one anonymization task is executed at a time in Protegrity Anonymization.

REST Client: The client connects to the Protegrity Anonymization REST Server using an API tool, such as Postman, to create, send, and receive the Protegrity Anonymization request. It also provides a Swagger interface detailing the APIs available. The Swagger interface can also be used as a REST client for raising API requests.
Python SDK: It is the Python programmatic interface used to communicate with the REST server.
Anon-Storage*: It is used to read data from and write data to the storage. It uses the S3 bucket framework to perform file operations.
Anon-DB: It is a PostgreSQL database that is used to store metadata related to Protegrity Anonymization jobs.

Feedback

Was this page helpful?

Last modified : June 19, 2026