Anonymization is a powerful feature that helps organizations protect sensitive data by anonymizing it while maintaining its utility for analysis and development. By leveraging AI, Anonymization enables organizations to transform sensitive data into anonymized data that preserves its analytical value while ensuring privacy and compliance.
This is the multi-page printable view of this section. Click here to print.
Anonymization
- 1: Anonymization Architecture
- 2: Prerequisites for Anonymization
- 3: Setting up Anonymization
- 4: Running the Anonymization samples
- 5: Using the Anonymization APIs
- 6: Uninstalling Anonymization
1 - Anonymization Architecture
Protegrity Anonymization allows processing of the datasets through generalization, to ensure the risk of re-identification is within tolerable thresholds. The anonymization process will have an impact on data utility, but Protegrity Anonymization optimizes this fundamental privacy-utility trade-off to ensure maximum data quality within the privacy goals.
Protegrity Anonymization leverages Kubernetes for data anonymization at scale and it provides instructions and support for deployment and usage on AWS EKS and Microsoft Azure AKS.
An overview of the communication is shown in the following figure.

Architecture
Protegrity Anonymization uses several pods on Kubernetes. The Protegrity Anonymization Web Server processes requests and stores the data securely in an internal Database Server. The Protegrity Anonymization request is received by the Nginx-Ingress component. Ingress forwards the request to the Anon-App. The Anon-App processes the request and submits the tasks to the cluster. The scheduler schedules tasks on the workers. The Anon-app stores the metadata about the job in the Anon-DB container. Next, the workers read, write, and process the data that is stored in the Anon-Storage, the request stream, or the Cloud storage. The Anon-Storage uses S3 bucket for storing data. The communication between the scheduler and the workers is handled by the scheduler. The workers run on random ports.
The user accesses Protegrity Anonymization using HTTPS over port 443. The user requests are directed to an Ingress Controller, and the controller in turn communicates with the required pods using the following ports:
- 8090: Ingress controller and the Protegrity Anonymization API Web Service
- 8786: Ingress controller
- 8100: Ingress controller and S3 bucket
Protegrity Anonymization leverages Kubernetes for data anonymization at scale and it provides instructions and support for deployment and usage on AWS EKS and Microsoft Azure AKS.
Components
Protegrity Anonymization is composed of the following main components:
- Protegrity Anonymization REST Server: This core component exposes a REST interface through which clients can interact with the Protegrity Anonymization service. It uses an in-memory task queue and stores anonymized datasets and respective metadata on persistent storage. Protegrity Anonymization tasks are submitted to a queue and are handled in first-in first out fashion.
Note: Only one anonymization task is executed at a time in Protegrity Anonymization.
- REST Client: The client connects to the Protegrity Anonymization REST Server using an API tool, such as Postman, to create, send, and receive the Protegrity Anonymization request. It also provides a Swagger interface detailing the APIs available. The Swagger interface can also be used as a REST client for raising API requests.
- Python SDK: It is the Python programmatic interface used to communicate with the REST server.
- Anon-Storage*: It is used to read data from and write data to the storage. It uses the S3 bucket framework to perform file operations.
- Anon-DB: It is a PostgreSQL database that is used to store metadata related to Protegrity Anonymization jobs.
2 - Prerequisites for Anonymization
Ensure that the following prerequisites are met before running these examples for Anonymization:
- Docker CLI, Docker Compose, and Python are installed. For more information, refer to AI Developer Edition, Pre-requisites Guide.
- For shell samples: Bash version greater than or equal to 5.1.8 and curl version greater than or equal to 7.76.1.
- For notebook samples: JupyterLab version greater than or equal to 4.5.6.
3 - Setting up Anonymization
Use the containers to set up the Anonymization feature required for identifying sensitive data.
Open a command prompt.
Navigate to the cloned repository location for protegrity-ai-developer-edition.
Run the following command to download and start the containers. The dependent containers are large in size. Based on the network connection, the containers might take time to download and deploy.
cd anonymization docker compose up -dBased on your configuration use the
docker-compose up -dcommand.Note: By default images are obtained from
ghcr.io. To obtain images frompublic.ecr.aws, navigate to theanonymizationdirectory and copy the.env.examplefile to.env. Open the.envfile and uncomment theREGISTRY=public.ecr.aws/protegrity-ai-developer-editionline in the file. Save the file and run thedocker compose up -dcommand to download and start the containers.Verify that the containers started successfully.
docker compose logsSet up the Jupyter notebook for working with the notebooks provided from the cloned repository location for protegrity-ai-developer-edition.
pip install -r shared/requirements.txtInstall the Anonymization SDK package.
pip install protegrity-anonymization-sdk
4 - Running the Anonymization samples
The example scripts under the anonymization/ folder demonstrate the usage of Anonymization APIs. For more information about the Anonymization APIs, refer to the section Anonymization APIs.
Note: A dedicated
anonymization/docker-compose.ymlis provided to start the Anonymization services.
Open a command prompt.
Navigate to the directory where AI Developer Edition is cloned.
Run the following command to start Jupyter Lab.
jupyter labCopy the URL displayed and navigate to the site from a web browser. Ensure that
localhostis replaced with the IP address of the system where the AI Developer Edition is set up.In the left pane of the Jupyter Lab, navigate to
anonymization/samples/python/sample-app-anonymization.Open the
anonymization.ipynbfile.Click the Play icon and follow the prompts in the Jupyter Lab.
5 - Using the Anonymization APIs
client
Anonymization SDK Client.
Provides synchronous (AnonymizationClient) and asynchronous (AsyncAnonymizationClient) Python clients for the Anonymization anonymization API.
Public models, enums, and exceptions are re-exported here for backward
compatibility so that from anonymization_sdk.client import X continues to work.
AnonymizationClient
class AnonymizationClient()
Synchronous client for the Anonymization anonymization API.
Arguments:
base_url- Base URL of the Anonymization API (default: http://localhost:8000)timeout- Request timeout in seconds (default: 30)headers- Additional headers to include in requests
__init__
def __init__(base_url: str = DEFAULT_BASE_URL,
timeout: float = DEFAULT_TIMEOUT,
headers: dict[str, str] | None = None,
mlops_config: dict[str, Any] | None = None)
Initialize the Anonymization client.
Arguments:
base_url- Base URL of the Anonymization APItimeout- Request timeout in secondsheaders- Additional HTTP headers to include in requestsmlops_config- Default MLOps tracking configuration applied to everyanonymize,auto_anonymize,apply_anon, andcalculate_riskcall. Can be overridden per-call by passingmlops_configexplicitly.
close
def close() -> None
Close the HTTP client.
is_healthy
def is_healthy() -> bool
Check if the API is healthy and responding.
Returns:
True if the API is reachable and healthy, False otherwise.
get_health
def get_health() -> dict[str, Any]
Get detailed health information from the API.
Returns:
Dictionary with health status, version, and component states.
Raises:
APIError- If the API returns an error status.
detect_qi
def detect_qi(data: DataInputType,
*,
mode: DetectionMode | str = DetectionMode.AUTO,
sampling_method: SamplingMethod | str = SamplingMethod.FAST,
cumulative_importance_threshold: float = 0.8,
max_quasi_identifiers: int = 10,
uniqueness_threshold: float = 0.95,
known_identifiers: list[str] | None = None,
known_sensitive: list[str] | None = None,
ignore_columns: list[str] | None = None) -> DetectionResult
Detect quasi-identifiers in a dataset.
Arguments:
data- Inline records (List[Dict]), local file path /file://URI, or cloud URI (s3://,gs://,azure://, etc.). Local paths are read and encoded automatically.mode- Detection algorithm (“auto”, “ml”, “heuristic”).sampling_method- Sampling strategy (“fast”, “full”, “adaptive”).cumulative_importance_threshold- Stop adding QIs at this cumulative importance threshold (0.0–1.0, default 0.8).max_quasi_identifiers- Maximum QIs to return (default 10).uniqueness_threshold- Columns above this uniqueness ratio are flagged as direct identifiers (0.0–1.0, default 0.95).known_identifiers- Columns you know are direct identifiers.known_sensitive- Columns you know are sensitive.ignore_columns- Columns to skip during detection.
Returns:
DetectionResult with quasi_identifiers, direct_identifiers, sensitive_attributes, attributes, and optional model_metrics.
Raises:
APIError- If the API returns an error.ValidationError- If the request is invalid.
generate_config
def generate_config(data: DataInputType,
*,
privacy_model: PrivacyModel
| str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
mode: DetectionMode | str = DetectionMode.AUTO,
**kwargs) -> AutoConfigResult
Generate anonymization configuration automatically.
Arguments:
data- Inline records (List[Dict]), local file path, or cloud URI.privacy_model- Privacy model (“k-anonymity”, “l-diversity”, “t-closeness”).k- K value (default 5).l- L value for l-diversity.t- T threshold for t-closeness.mode- Detection algorithm (“auto”, “ml”, “heuristic”).**kwargs- max_suppression, diversity_type, distance_metric, sampling_method.
Returns:
AutoConfigResult with detection results and a ready-to-use anonymize_request configuration dict.
calculate_risk
def calculate_risk(data: DataInputType,
quasi_identifiers: list[str] | None = None,
*,
risk_threshold: float = 0.2,
suppress_value: str = "*",
include_prosecutor: bool = True,
include_journalist: bool = True,
include_marketer: bool = True,
mlops_config: dict[str, Any] | None = None) -> RiskResult
Calculate re-identification risk metrics.
Arguments:
data- Inline records (List[Dict]), local file path, or cloud URI.quasi_identifiers- QI column names to consider for risk.risk_threshold- Records above this threshold are “at risk” (default 0.2).suppress_value- Value marking suppressed records (default “*”).include_prosecutor- Calculate prosecutor risk (default True).include_journalist- Calculate journalist risk (default True).include_marketer- Calculate marketer risk (default True).mlops_config- MLOps config override.
Returns:
RiskResult with prosecutor, journalist, marketer risk models and k_anonymity, highest_risk_level, equivalence class statistics.
anonymize
def anonymize(data: DataInputType,
*,
privacy_model: PrivacyModel | str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
attributes: list[dict[str, Any]] | None = None,
max_suppression: float = 0.0,
output_uri: str | None = None,
output_format: str = "csv",
mlops_config: dict[str, Any] | None = None,
**kwargs) -> AnonymizeResult
Anonymize data synchronously using the specified privacy model.
Arguments:
data- Inline records (List[Dict]), local file path /file://URI, or cloud URI (s3://,gs://,azure://, etc.). Local paths are read and encoded automatically.privacy_model- Privacy model (“k-anonymity”, “l-diversity”, “t-closeness”).k- K value for k-anonymity (default 5).l- L value for l-diversity.t- T threshold for t-closeness (0.0–1.0).attributes- Attribute configurations - list of dicts withname,type(“quasi_identifier”, “sensitive”, “identifier”, “insensitive”), and optionalhierarchy.max_suppression- Maximum fraction of records to suppress (0.0–1.0).output_uri- Cloud URI to write results to instead of returning inline (e.g."s3://bucket/output.csv"). When set,result_pathis populated in the response instead ofdata.output_format- Format for cloud output (“csv”, “parquet”, “json”).mlops_config- MLOps tracking configuration.**kwargs- diversity_type, distance_metric, use_lattice_search, etc.
Returns:
AnonymizeResult with data (inline), or result_path (cloud output), row_count, suppressed_count, and metrics.
submit_job
def submit_job(data: DataInputType,
*,
privacy_model: PrivacyModel | str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
attributes: list[dict[str, Any]] | None = None,
max_suppression: float = 0.0,
**kwargs) -> JobResponse
Submit an anonymization job for asynchronous processing.
Arguments:
data- Inline records (List[Dict]), local file path, or cloud URI.privacy_model- Privacy model (“k-anonymity”, “l-diversity”, “t-closeness”).k- K value for k-anonymity (default 5).l- L value for l-diversity.t- T threshold for t-closeness.attributes- Attribute configurations.max_suppression- Maximum suppression rate (0.0–1.0).**kwargs- Additional parameters (diversity_type, distance_metric).
Returns:
JobResponse with job_id, status, message, and created_at timestamp.
get_job_status
def get_job_status(job_id: str) -> JobStatusResponse
Get the status of an anonymization job.
Poll this method to track progress of jobs submitted via submit_job(). The response includes progress percentage, status, timestamps, and any error messages if the job failed.
Arguments:
job_id- Unique job identifier returned by submit_job()
Returns:
JobStatusResponse with:
- job_id: Job identifier
- status: Current status (pending, running, completed, failed, cancelled)
- progress: Progress percentage (0-100)
- message: Status message
- created_at: Job creation timestamp
- updated_at: Last update timestamp
- completed_at: Completion timestamp (if completed)
- result_path: Path to result file (if completed)
- error: Error message (if failed)
Raises:
APIError- If job not found or API call fails
cancel_job
def cancel_job(job_id: str) -> None
Cancel a pending or running anonymization job.
Cancels a job that was submitted via submit_job(). Only jobs with status PENDING or RUNNING can be cancelled. Completed, failed, or already cancelled jobs cannot be cancelled.
Arguments:
job_id- Unique job identifier returned by submit_job()
Raises:
APIError- If job not found or cannot be cancelled
apply_anon
def apply_anon(job_id: str,
data: DataInputType,
*,
mlops_config: dict[str, Any] | None = None) -> "ApplyResult"
Apply a saved anonymization solution to new data.
Re-uses the generalization levels computed during a prior
anonymize() call identified by job_id. The lattice is
not recomputed.
Arguments:
job_id- Solution identifier returned inAnonymizeResult.job_id.data- Inline records (List[Dict]), local file path, or cloud URI.mlops_config- Optional per-request MLOps tracking configuration.
Returns:
ApplyResult with anonymized data, row/suppressed counts, source_job_id, and privacy_model.
list_models
def list_models(*,
model_type: str | None = None,
all_metrics: bool = False) -> dict[str, Any]
List tracked anonymization models in Production.
Arguments:
model_type- Optional filter by privacy model type (e.g. “k-anonymity”).all_metrics- If True, return all metrics instead of only the promotion metric.
Returns:
Raw response dict with ‘models’ list and ‘count’.
list_jobs
def list_jobs(*,
status: JobStatus | str | None = None,
limit: int = 100,
offset: int = 0) -> "JobListResult"
List / browse all jobs with optional status filter and pagination.
Returns newest jobs first.
Arguments:
status- Optional filter (e.g. JobStatus.COMPLETED or “failed”)limit- Page size (1-1000, default 100)offset- Page offset (default 0)
Returns:
JobListResult with jobs list, total count, limit, and offset.
Raises:
APIError- If the API call fails.
get_job_history
def get_job_history(job_id: str) -> list["JobHistoryEntry"]
Get the full state-transition audit trail for a job.
Each create/update call on the server appends an entry with the status, step, progress, and timestamp at that point.
Arguments:
job_id- Unique job identifier.
Returns:
List of JobHistoryEntry ordered by sequence.
Raises:
APIError- If job not found or API call fails.
wait_for_job
def wait_for_job(job_id: str,
*,
poll_interval: float = 2.0,
timeout: float = 600.0,
callback: Any | None = None) -> JobStatusResponse
Poll a job until it reaches a terminal state and return its status.
Arguments:
job_id- Unique job identifier returned by submit_job().poll_interval- Seconds between status polls (default 2s).timeout- Maximum seconds to wait (default 600s / 10 min).callback- Optional callable(JobStatusResponse) -> Noneinvoked after each poll.
Returns:
JobStatusResponse at the terminal state. The anonymization
result (if completed) is available in status.context["result"].
Raises:
APIError- If the job ends in afailedstate.TimeoutError- If the job does not complete within timeout.
auto_anonymize
def auto_anonymize(data: DataInputType,
*,
privacy_model: PrivacyModel
| str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
mode: DetectionMode | str = DetectionMode.AUTO,
mlops_config: dict[str, Any] | None = None,
**kwargs) -> AutoAnonymizeResult
Automatically detect QIs and anonymize in one step.
Arguments:
data- Inline records (List[Dict]), local file path, or cloud URI.privacy_model- Privacy model (“k-anonymity”, “l-diversity”, “t-closeness”).k- K value (default 5).l- L value for l-diversity.t- T threshold for t-closeness.mode- Detection algorithm (“auto”, “ml”, “heuristic”).mlops_config- MLOps tracking configuration.**kwargs- max_suppression, sampling_method, use_lattice_search, etc.
Returns:
AutoAnonymizeResult with detection results and anonymized data.
validate
def validate(
data: DataInputType,
quasi_identifiers: list[str] | None = None,
*,
privacy_model: PrivacyModel | str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
sensitive_attributes: list[str] | None = None) -> ValidationResult
Validate that data meets privacy requirements.
Arguments:
data- Inline records (List[Dict]), local file path, or cloud URI.quasi_identifiers- QI column names to check.privacy_model- Privacy model to validate against.k- Required k for k-anonymity (default 5).l- Required l for l-diversity.t- Required t for t-closeness.sensitive_attributes- Sensitive columns (required for l-diversity/t-closeness).
Returns:
ValidationResult with is_valid, model_type, violations, statistics.
measure
def measure(original_data: DataInputType,
anonymized_data: DataInputType,
quasi_identifiers: list[str] | None = None) -> MetricsResult
Measure anonymization quality metrics.
Arguments:
original_data- Original dataset - inline records, local path, or cloud URI.anonymized_data- Anonymized dataset - inline records, local path, or cloud URI.quasi_identifiers- QI column names that were generalized.
Returns:
MetricsResult with information_loss and detailed metrics.
create_pattern
def create_pattern(name: str,
classification: str,
column_patterns: list[str],
*,
priority: int = 50,
value_patterns: list[str] | None = None,
min_match_ratio: float = 0.8,
description: str | None = None) -> Pattern
Create a custom detection pattern.
Patterns are used during QI detection to automatically classify columns based on their names and values. Custom patterns take precedence over built-in patterns.
Arguments:
name- Unique name for the pattern (e.g., ‘customer_id’)classification- Classification type - one of:- “DI”: Direct Identifier (e.g., SSN, email)
- “QI”: Quasi-Identifier (e.g., age, zipcode)
- “SI”: Sensitive Identifier (e.g., salary, diagnosis)
- “NSI”: Non-Sensitive Identifier (safe to publish)
column_patterns- List of column name patterns to match. Case-insensitive. Use ‘’ as wildcard (e.g., [’_id’, ‘user*’])priority- Priority level (1-1000, lower = checked first). Default: 50value_patterns- Optional list of regex patterns for value validationmin_match_ratio- Minimum ratio of values that must match (0-1). Default: 0.8description- Optional description of what this pattern detects
Returns:
Pattern object with assigned ID and metadata
Raises:
APIError- If creation fails (e.g., duplicate name)ValidationError- If parameters are invalid
list_patterns
def list_patterns(classification: str | None = None) -> PatternListResult
List all custom detection patterns.
Arguments:
classification- Optional filter by classification (DI, QI, SI, NSI)
Returns:
PatternListResult containing list of patterns and total count
get_pattern
def get_pattern(pattern_id: str) -> Pattern
Get a specific pattern by ID.
Arguments:
pattern_id- The pattern ID to retrieve
Returns:
Pattern object
Raises:
APIError- If pattern not found (404)
update_pattern
def update_pattern(pattern_id: str,
*,
name: str | None = None,
classification: str | None = None,
column_patterns: list[str] | None = None,
priority: int | None = None,
value_patterns: list[str] | None = None,
min_match_ratio: float | None = None,
description: str | None = None) -> Pattern
Update an existing pattern.
Only provided fields will be updated; others remain unchanged.
Arguments:
pattern_id- The pattern ID to updatename- New name for the patternclassification- New classification (DI, QI, SI, NSI)column_patterns- New column name patternspriority- New priority (1-1000)value_patterns- New value regex patternsmin_match_ratio- New minimum match ratio (0-1)description- New description
Returns:
Updated Pattern object
Raises:
APIError- If pattern not found or update failsValidationError- If parameters are invalid
delete_pattern
def delete_pattern(pattern_id: str) -> dict[str, Any]
Delete a pattern by ID.
Arguments:
pattern_id- The pattern ID to delete
Returns:
Dictionary with confirmation message
Raises:
APIError- If pattern not found (404)
delete_all_patterns
def delete_all_patterns() -> dict[str, Any]
Delete all custom patterns.
WARNING: This removes all customer-defined patterns. Built-in patterns from the YAML config are not affected.
Returns:
Dictionary with count of deleted patterns
reload_patterns
def reload_patterns() -> dict[str, Any]
Reload patterns from storage file.
Use this to sync after manual file edits.
Returns:
Dictionary with count of reloaded patterns
dp_compute
def dp_compute(data: DataInputType,
*,
mechanism: DPMechanismType | str = DPMechanismType.MEAN,
column: str | None = None,
columns: list[str] | None = None,
group_by: str | None = None,
epsilon: float = 1.0,
delta: float = 0.0,
noise_type: DPNoiseType | str = DPNoiseType.LAPLACE,
bounds: tuple | None = None,
bins: int | None = None,
histogram_range: tuple | None = None,
session_id: str | None = None,
predicate: str | None = None,
candidates: list | None = None,
utility_scores: list[float] | None = None,
sensitivity: float | None = None,
epsilon_map: dict[str, float] | None = None,
min_group_size: int | None = None) -> DPComputeResult
Compute a differentially private statistic on a data column.
Arguments:
data- Inline records (List[Dict]), local file path, or cloud URI.mechanism- DP mechanism (“mean”, “sum”, “variance”, “histogram”, “count”, “exponential”).column- Column name for single-column queries.columns- Column names for multi-column queries.group_by- Categorical column to group by.epsilon- Privacy parameter epsilon (>0).delta- Privacy parameter delta (>=0, <1).noise_type- “laplace” or “gaussian”.bounds- (lower, upper) clipping bounds. Required for mean/sum/variance.bins- Number of histogram bins (histogram only).histogram_range- (min, max) range for histogram bins.session_id- Budget session ID for cumulative tracking.predicate- Filter expression (e.g., “> 50”, “<= 100”).candidates- Candidate outputs (exponential mechanism only).utility_scores- Utility scores for candidates (exponential only).sensitivity- Utility function sensitivity (exponential only).epsilon_map- Per-column or per-group epsilon overrides.min_group_size- Minimum rows per group (default 5).
Returns:
DPComputeResult with private_value (single) or results dict (multi/group).
dp_stream_update
def dp_stream_update(session_id: str | None = None,
data: DataInputType | None = None,
*,
column: str | None = None,
columns: list[str] | None = None,
group_by: str | None = None,
mechanism: DPStreamMechanismType | str | None = None,
epsilon: float | None = None,
delta: float | None = None,
noise_type: DPNoiseType | str | None = None,
bounds: tuple | None = None,
get_result: bool = False,
window_size: int | None = None,
epsilon_map: dict[str, float] | None = None,
min_group_size: int | None = None,
budget_session_id: str | None = None) -> DPStreamResult
Feed data into a streaming DP session.
On the first call for a session_id, provide mechanism, epsilon, and bounds. Subsequent calls only need session_id, data, and column.
Arguments:
session_id- Unique session identifier.data- Batch of records. Mutually exclusive with data_path.data_path- Cloud/local URI for data batch.column- Column name for single-column streaming.columns- Column names for multi-column streaming.group_by- Categorical column to group by.mechanism- Streaming mechanism. Required on first call.epsilon- Privacy epsilon. Required on first call.delta- Privacy delta.noise_type- Noise mechanism.bounds- Clipping bounds. Required on first call (except for count).get_result- If True, also return the current private result.window_size- Window size for sliding/tumbling window mechanisms.epsilon_map- Per-column or per-group epsilon overrides.min_group_size- Minimum rows per group (default 5).budget_session_id- Link to a budget session for automatic deduction.
Returns:
DPStreamResult with session status and optional results.
dp_stream_delete
def dp_stream_delete(session_id: str) -> None
Delete a streaming DP session.
Arguments:
session_id- Session to delete.
dp_stream_list_sessions
def dp_stream_list_sessions() -> list
List all active streaming DP sessions.
Returns:
List of dicts with session_id, mechanism, column, batches_processed, total_count.
dp_budget_create
def dp_budget_create(session_id: str,
epsilon_budget: float,
delta_budget: float = 0.0,
composition: str = "basic") -> DPBudgetStatus
Create a privacy budget session.
Arguments:
session_id- Unique session identifier.epsilon_budget- Total epsilon budget.delta_budget- Total delta budget.composition- Composition mode (“basic” or “rdp”). RDP requires delta_budget > 0 and yields tighter privacy accounting.
Returns:
DPBudgetStatus with initial budget state.
dp_budget_status
def dp_budget_status(session_id: str) -> DPBudgetStatus
Get privacy budget status for a session.
Arguments:
session_id- Session to query.
Returns:
DPBudgetStatus with current spend and remaining budget.
dp_budget_delete
def dp_budget_delete(session_id: str) -> None
Delete a privacy budget session.
Arguments:
session_id- Session to delete.
dp_advise_composition
def dp_advise_composition(epsilon_budget: float,
num_queries: int,
delta_budget: float = 0.0,
delta_per_query: float = 0.0) -> dict
Get composition advice for planned queries.
Returns optimal per-query epsilon under basic and RDP composition with a recommendation.
Arguments:
epsilon_budget- Total epsilon budget available.num_queries- Number of planned queries.delta_budget- Total delta budget (required for RDP comparison).delta_per_query- Delta per query for Gaussian noise. 0 = Laplace.
Returns:
Dict with basic/rdp analysis, recommendation, and savings_pct.
audit_list
def audit_list(*,
operation: str | None = None,
status: str | None = None,
limit: int = 50,
offset: int = 0) -> list[AuditEntry]
List audit log entries.
Arguments:
operation- Filter by operation (dp_compute, anonymize_sync, …).status- Filter by outcome (‘success’ or ’error’).limit- Max entries to return (1–500).offset- Pagination offset.
Returns:
List of AuditEntry objects.
audit_get
def audit_get(entry_id: str) -> AuditEntry
Get a single audit entry.
Arguments:
entry_id- Audit entry ID.
Returns:
AuditEntry with full details.
Raises:
APIError- If entry not found (404).
AsyncAnonymizationClient
class AsyncAnonymizationClient()
Asynchronous client for the Anonymization anonymization API.
Same interface as AnonymizationClient but with async/await support.
__init__
def __init__(base_url: str = DEFAULT_BASE_URL,
timeout: float = DEFAULT_TIMEOUT,
headers: dict[str, str] | None = None,
mlops_config: dict[str, Any] | None = None)
Initialize the async Anonymization client.
Arguments:
base_url- Base URL of the Anonymization APItimeout- Request timeout in secondsheaders- Additional HTTP headers to include in requestsmlops_config- Default MLOps tracking configuration applied to everyanonymize,auto_anonymize,apply_anon, andcalculate_riskcall. Can be overridden per-call.
close
async def close() -> None
Close the HTTP client.
is_healthy
async def is_healthy() -> bool
Check if the API is healthy and responding.
get_health
async def get_health() -> dict[str, Any]
Get detailed health information.
detect_qi
async def detect_qi(
data: DataInputType,
*,
mode: DetectionMode | str = DetectionMode.AUTO,
sampling_method: SamplingMethod | str = SamplingMethod.FAST,
cumulative_importance_threshold: float = 0.8,
max_quasi_identifiers: int = 10,
uniqueness_threshold: float = 0.95,
known_identifiers: list[str] | None = None,
known_sensitive: list[str] | None = None,
ignore_columns: list[str] | None = None) -> DetectionResult
Detect quasi-identifiers (async version).
Refer to synchronous detect_qi() for full documentation.
generate_config
async def generate_config(data: DataInputType,
*,
privacy_model: PrivacyModel
| str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
mode: DetectionMode | str = DetectionMode.AUTO,
**kwargs) -> AutoConfigResult
Generate anonymization configuration automatically (async version).
calculate_risk
async def calculate_risk(
data: DataInputType,
quasi_identifiers: list[str] | None = None,
*,
risk_threshold: float = 0.2,
suppress_value: str = "*",
include_prosecutor: bool = True,
include_journalist: bool = True,
include_marketer: bool = True,
mlops_config: dict[str, Any] | None = None) -> RiskResult
Calculate re-identification risk metrics (async version).
anonymize
async def anonymize(data: DataInputType,
*,
privacy_model: PrivacyModel
| str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
attributes: list[dict[str, Any]] | None = None,
max_suppression: float = 0.0,
output_uri: str | None = None,
output_format: str = "csv",
mlops_config: dict[str, Any] | None = None,
**kwargs) -> AnonymizeResult
Anonymize data (async version). Refer to synchronous anonymize() for full documentation.
submit_job
async def submit_job(data: DataInputType,
*,
privacy_model: PrivacyModel
| str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
attributes: list[dict[str, Any]] | None = None,
max_suppression: float = 0.0,
**kwargs) -> JobResponse
Submit anonymization job (async version).
Refer to synchronous submit_job() for full documentation.
get_job_status
async def get_job_status(job_id: str) -> JobStatusResponse
Get job status (async version).
Refer to synchronous get_job_status() for full documentation.
cancel_job
async def cancel_job(job_id: str) -> None
Cancel job (async version). Refer to synchronous cancel_job() for full documentation.
apply_anon
async def apply_anon(
job_id: str,
data: DataInputType,
*,
mlops_config: dict[str, Any] | None = None) -> "ApplyResult"
Apply saved anonymization (async). Refer to synchronous apply_anon() for full docs.
list_models
async def list_models(*,
model_type: str | None = None,
all_metrics: bool = False) -> dict[str, Any]
List tracked anonymization models (async).
Refer to synchronous list_models() for full docs.
list_jobs
async def list_jobs(*,
status: JobStatus | str | None = None,
limit: int = 100,
offset: int = 0) -> "JobListResult"
List jobs (async version). Refer to synchronous list_jobs() for full documentation.
get_job_history
async def get_job_history(job_id: str) -> list["JobHistoryEntry"]
Get job history (async version).
Refer to synchronous get_job_history() for full documentation.
wait_for_job
async def wait_for_job(job_id: str,
*,
poll_interval: float = 2.0,
timeout: float = 600.0,
callback: Any | None = None) -> JobStatusResponse
Async version of wait_for_job().
Refer to synchronous wait_for_job() for full documentation.
auto_anonymize
async def auto_anonymize(data: DataInputType,
*,
privacy_model: PrivacyModel
| str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
mode: DetectionMode | str = DetectionMode.AUTO,
mlops_config: dict[str, Any] | None = None,
**kwargs) -> AutoAnonymizeResult
Auto-detect and anonymize (async version).
Refer to synchronous auto_anonymize() for full docs.
validate
async def validate(
data: DataInputType,
quasi_identifiers: list[str] | None = None,
*,
privacy_model: PrivacyModel | str = PrivacyModel.K_ANONYMITY,
k: int = 5,
l: int | None = None,
t: float | None = None,
sensitive_attributes: list[str] | None = None) -> ValidationResult
Validate privacy requirements (async version).
measure
async def measure(original_data: DataInputType,
anonymized_data: DataInputType,
quasi_identifiers: list[str] | None = None) -> MetricsResult
Measure anonymization quality metrics (async version).
create_pattern
async def create_pattern(name: str,
classification: str,
column_patterns: list[str],
*,
priority: int = 50,
value_patterns: list[str] | None = None,
min_match_ratio: float = 0.8,
description: str | None = None) -> Pattern
Create a custom detection pattern (async version).
list_patterns
async def list_patterns(
classification: str | None = None) -> PatternListResult
List all custom detection patterns (async version).
get_pattern
async def get_pattern(pattern_id: str) -> Pattern
Get a specific pattern by ID (async version).
update_pattern
async def update_pattern(pattern_id: str,
*,
name: str | None = None,
classification: str | None = None,
column_patterns: list[str] | None = None,
priority: int | None = None,
value_patterns: list[str] | None = None,
min_match_ratio: float | None = None,
description: str | None = None) -> Pattern
Update an existing pattern (async version).
delete_pattern
async def delete_pattern(pattern_id: str) -> dict[str, Any]
Delete a pattern by ID (async version).
delete_all_patterns
async def delete_all_patterns() -> dict[str, Any]
Delete all custom patterns (async version).
reload_patterns
async def reload_patterns() -> dict[str, Any]
Reload patterns from storage file (async version).
audit_list
async def audit_list(*,
operation: str | None = None,
status: str | None = None,
limit: int = 50,
offset: int = 0) -> list[AuditEntry]
List audit log entries (async version).
audit_get
async def audit_get(entry_id: str) -> AuditEntry
Get a single audit entry (async version).
exceptions
Anonymization SDK Exceptions.
Custom exception hierarchy for the Anonymization SDK client library. All SDK exceptions inherit from AnonymizationClientError.
AnonymizationClientError
class AnonymizationClientError(Exception)
Base exception for all SDK errors.
ValidationError
class ValidationError(AnonymizationClientError)
Request validation failed (422 from server or client-side validation).
APIError
class APIError(AnonymizationClientError)
API returned an error response (4xx or 5xx status code).
AnonymizationConnectionError
class AnonymizationConnectionError(AnonymizationClientError)
Failed to connect to the API (network/timeout error).
TierRestrictionError
class TierRestrictionError(AnonymizationClientError)
Feature not available in the current server tier (403 from server).
The server returned a tier-restriction error indicating the requested feature requires a higher tier. Inspect the structured fields for details.
models
Anonymization SDK Response Models and Enums.
Contains all enums (PrivacyModel, DetectionMode, etc.) and response dataclasses (DetectionResult, RiskResult, AnonymizeResult, etc.) used by both the synchronous and asynchronous Anonymization clients.
PrivacyModel
class PrivacyModel(StrEnum)
Supported privacy models.
DetectionMode
class DetectionMode(StrEnum)
QI detection algorithm modes.
SamplingMethod
class SamplingMethod(StrEnum)
Sampling methods for detection.
RiskLevel
class RiskLevel(StrEnum)
Risk level classifications.
JobStatus
class JobStatus(StrEnum)
Job execution status.
AttributeClassification
@dataclass
class AttributeClassification()
Classification result for a single attribute.
ModelMetrics
@dataclass
class ModelMetrics()
ML model performance metrics.
DetectionResult
@dataclass
class DetectionResult()
Result of QI detection.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "DetectionResult"
Create from API response dict.
ProsecutorRisk
@dataclass
class ProsecutorRisk(_BaseAttackerRisk)
Prosecutor risk model result.
JournalistRisk
@dataclass
class JournalistRisk(_BaseAttackerRisk)
Journalist risk model result.
MarketerRisk
@dataclass
class MarketerRisk()
Marketer risk model result.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "MarketerRisk"
Create from API response dict.
RiskResult
@dataclass
class RiskResult()
Complete risk metrics result.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "RiskResult"
Create from API response dict.
is_k_anonymous
def is_k_anonymous(k: int) -> bool
Check if data satisfies k-anonymity.
MetricsResult
@dataclass
class MetricsResult()
Anonymization quality metrics.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "MetricsResult"
Create from API response dict.
AnonymizeResult
@dataclass
class AnonymizeResult()
Result of anonymization operation.
result_path
Cloud storage URI if saved to cloud
job_id
Solution identifier for apply_anon()
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AnonymizeResult"
Create from API response dict.
ApplyResult
@dataclass
class ApplyResult()
Result of applying a saved anonymization solution to new data.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "ApplyResult"
Create from API response dict.
ValidationResult
@dataclass
class ValidationResult()
Result of privacy validation.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "ValidationResult"
Create from API response dict.
AutoConfigResult
@dataclass
class AutoConfigResult()
Result of auto-configuration generation.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AutoConfigResult"
Create from API response dict.
AutoAnonymizeResult
@dataclass
class AutoAnonymizeResult()
Result of combined detection + anonymization.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AutoAnonymizeResult"
Create from API response dict.
Pattern
@dataclass
class Pattern()
Detection pattern for automatic QI classification.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "Pattern"
Create from API response dict.
PatternListResult
@dataclass
class PatternListResult()
Result of pattern list operation.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "PatternListResult"
Create from API response dict.
JobResponse
@dataclass
class JobResponse()
Response for job submission.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "JobResponse"
Create from API response dict.
JobStatusResponse
@dataclass
class JobStatusResponse()
Response for job status query.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "JobStatusResponse"
Create from API response dict.
JobHistoryEntry
@dataclass
class JobHistoryEntry()
A single point-in-time snapshot from the job audit trail.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "JobHistoryEntry"
Create from API response dict.
JobListResult
@dataclass
class JobListResult()
Paginated list of jobs.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "JobListResult"
Create from API response dict.
DPMechanismType
class DPMechanismType(StrEnum)
Supported batch DP mechanisms.
DPStreamMechanismType
class DPStreamMechanismType(StrEnum)
Supported streaming DP mechanisms.
DPNoiseType
class DPNoiseType(StrEnum)
Supported noise mechanisms.
DPComputeResult
@dataclass
class DPComputeResult()
Result of a batch DP computation.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "DPComputeResult"
Create from API response dict.
DPStreamResult
@dataclass
class DPStreamResult()
Result of a streaming DP update.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "DPStreamResult"
Create from API response dict.
DPBudgetStatus
@dataclass
class DPBudgetStatus()
Privacy budget status for a session.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "DPBudgetStatus"
Create from API response dict.
AuditEntry
@dataclass
class AuditEntry()
A single audit log entry.
from_dict
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "AuditEntry"
Create from API response dict.
6 - Uninstalling Anonymization
Open a command prompt.
Navigate to the cloned repository location.
Navigate to the
anonymizationdirectory.cd anonymizationRun the following command to remove the containers and images.
docker compose down --rmi all