This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

AI Developer Edition

Protegrity’s AI Developer Edition is a self-contained experimentation platform that showcases the capabilities of Protegrity’s products

1: Introduction to Protegrity AI Developer Edition

2: AI Developer Edition Architecture

3: Setting up AI Developer Edition

3.1: Prerequisites
3.2: Optional - Obtaining access to the AI Developer Edition API Service
3.3: Setting up the packages
3.4: Verifying the files in the Protegrity AI Developer Edition package

4: Running the sample application

4.1: Data Discovery API

4.2: Application Protector Python APIs

5: Customizing the sample application

6: Building the Python module

7: Appendix

7.1: Input Sanitization
7.2: Working with the Data Discovery containers

7.2.1: Understanding the Docker Compose File
7.2.2: Deploying the Application

7.3: Supported Sensitive Entity Types
7.4: Data Security Policy
7.5: Removing AI Developer Edition
7.6: Known Issues

1 - Introduction to Protegrity AI Developer Edition

Overview of the product.

Protegrity AI Developer Edition is a lightweight, containerized sandbox. It lets developers and data scientists quickly prototype, test, and integrate data protection and discovery into their workflows. It does not require setting up a complex infrastructure and managing its operational overhead.

It is a self-contained, Docker-based environment designed to help developers, data scientists, and architects quickly explore and prototype data protection and discovery workflows. It enables a user to have a hands-on experimentation without the need for enterprise infrastructure. With a modular architecture, built-in sample data, and a developer-first experience, AI Developer Edition is ideal for evaluating Protegrity’s capabilities in a fast, flexible, and frictionless way.

What is Protegrity AI Developer Edition?

Protegrity AI Developer Edition is designed to help a developer move quickly from idea to implementation, using familiar tools, sample apps, and open APIs.

It provides a streamlined environment to:

Discover and redact sensitive data using APIs and sample apps.
Discover and protect sensitive data using APIs and sample apps.
Perform message and conversation level risk scoring.
Scan Personally identifiable information (PII) for GenAI flows.
Provide a streamlined environment to test real world usecases with sample datasets and guided walkthroughs.

AI Developer Edition runs entirely on Docker, making it easy to spin up, tear down, and iterate quickly. It helps the user build a proof of concept, validate integration points, and get familiar with Protegrity’s core concepts. This edition provides the tools to set up the product fast and independently.

This product is not meant for production use, but it is the perfect launchpad for innovation.

Key Features

AI Developer Edition is purpose-built for fast, frictionless exploration of Protegrity’s core capabilities.

The following features make it ideal for prototyping and integration:

Modular, Containerized Architecture: AI Developer Edition runs on Docker, making it easy to test, isolate, and iterate.
Sample Apps and Data: Jumpstart evaluation with ready-to-run sample apps that demonstrate real-world use cases, such as finding sensitive data in unstructured text or finding and redacting sensitive data.
Python Module: This version includes an open-source Python module to use Protegrity in the development environment.
Lightweight: No Enterprise Security Administrator (ESA). No orchestration overhead. Just deploy the container and use the sample application.
Data Discovery: This container identifies, classifies, masks, redacts, or protects sensitive data. It uses built-in and custom classifiers to detect sensitive data with confidence scoring.
Semantic Guardrails: This container is used to analyze conversational data and apply privacy and appropriateness filters. This feature helps enforce content boundaries and detect PII using Protegrity’s Data Discovery engine.
AI Developer Edition API Service: A service hosted by Protegrity that allows developers to interact with Protegrity’s protection and discovery services through intuitive endpoints. It supports protection and unprotection of sensitive data, enabling rapid prototyping and testing of data protection scenarios without needing full-scale infrastructure. Registration is required for this service. The credentials can be obtained for free.

This product is continuously improving. The features mentioned here are either already available or will be available shortly.

Protegrity AI Developer Edition Personas

The primary personas who benefit most from AI Developer Edition.

Persona	Role Description	Goals	Typical Activities
Application Developer	Builds and integrates applications that handle sensitive data.	- Embed protection APIs. - Prototype quickly. - Validate integration points.	- Run sample apps.
Data Scientist or ML Engineers	Works with sensitive datasets in analytics and machine learning workflows.	- Discover and classify PII. - Protect training data. - Ensure compliance.	- Use discovery APIs. - Integrate with Jupyter notebooks. - Test module.
Solution Architect	Designs end-to-end data protection strategies across systems and teams.	- Evaluate platform fit. - Define architecture. - Guide implementation.	- Review sample apps. - Test modular deployment. - Assess performance.
Security or Privacy Lead	Ensures data protection aligns with compliance and governance requirements.	- Understand protection methods. - Validate policy behavior. - Review audit paths.	- Inspect logs. - Simulate policy scenarios. - Review discovery results.

Use Cases

A range of use cases across both Data Protection, Security, and emerging GenAI-driven applications are supported.

Data Protection and Security Use Cases

These use cases focus on helping developers and data scientists secure sensitive data in conventional applications, services, and pipelines.

Use Case	Description
Find and Redact	Discover sensitive data using Data Discovery API and redact or mask them.
Find and Protect	Discover sensitive data using Data Discovery API and tokenize protect them.
Sample App Prototyping	Use prebuilt apps to simulate real-world scenarios like protecting PII unstructured text. Helps accelerate evaluation and integration.
Python Module Integration	Integrate protection APIs into Python using lightweight modules. Useful for embedding Protegrity into existing development pipelines.
API Evaluation	Directly test protection and discovery APIs using tools like Postman or curl. Enables low-friction exploration of Protegrity’s core capabilities.

GenAI Use Cases

AI Developer Edition supports emerging GenAI workflows where sensitive data may be used in prompts, training datasets, or inference pipelines. These use cases help developers and data scientists ensure privacy and compliance when working with large language models (LLMs) and AI-driven applications.

The Semantic Guardrail feature and samples are provided with the Develper Edition. The use cases listed here are potential applications that users can develop using the feature.

Use Case	Description
Chatbot Input Protection	Protect sensitive user inputs, such as names, emails, IDs, before passing them to GenAI models. Ensures privacy compliance in conversational AI workflows.
Prompt Sanitization	Automatically detect and mask PII in prompts used for LLM-based applications. Helps reduce risk in prompt engineering and inference.
Training Data Anonymization	Discover and redact sensitive fields in datasets used to train GenAI models. Supports responsible AI development practices.
Notebook-Based Experimentation	Use Jupyter notebooks to test protection and discovery workflows in GenAI pipelines. Ideal for data scientists working with unstructured or semi-structured data.

These use cases are especially relevant for teams building AI-powered tools that interact with real-world user data, where privacy and data protection are critical.

2 - AI Developer Edition Architecture

An explanation of the architecture and components of AI Developer Edition.

A high-level architecture of AI Developer Edition is provided in the following image.

This release of AI Developer Edition consists of sample applications that utilizes and showcases the capabilities of Data Discovery, Semantic Guardrail, protection, and unprotection using simple Python modules. The Data Discovery component is used for identifying sensitive data. After identification, the Python module redacts, masks, or protects the sensitive information. Protection is done using the AI Developer Edition API Service.

Data Discovery

Data Discovery is a powerful, developer-friendly product designed specifically to address this challenge.

For more information, refer to the Data Discovery documentation.

Overview

Data Discovery Text Classification service advances data discovery and classification, specializing in the detection of Personally Identifiable Information (PII), Protected Health Information (PHI), Payment Card Information (PCI) within plain text and free-text inputs. Unlike traditional structured data tools, it excels in dynamic, unstructured environments such as chatbot conversations, call transcripts, and Generative AI (GenAI) outputs.

Architecture

Data Discovery consists of three containers that are hosted on Docker, the Classification container, the Presidio provider container, and similarly, the RoBERTa provider container. The general architecture is illustrated in the following figure.

Component	Description
1	The user enters the data to be classified for sensitive data as text body and sends the request to the Classification service.
2	This Classification service then distributes the request to the Presidio and RoBERTa service providers to process the data.
3	The Presidio and RoBERTa providers process the data based on their logic and classify them in the form of a response to the Classification service.
4	The Classification service then aggregates the responses from the service providers and sends it to the user.

Semantic Guardrail

Protegrity’s GenAI Security - Semantic Guardrail solution is a security guardrail engine for AI systems. It evaluates risks in GenAI chatbots, workflows, and agents through advanced semantic analytics and intent classification to detect potentially malicious messages. PII detection can also be leveraged for comprehensive security coverage.

For more information, refer to the Semantic Guardrail documentation.

Overview

The current implementation is trained on synthetic customer-service AI chatbot datasets. The system performs best when analyzing conversations expected to match the training domain, that is, English-language based customer service interactions involving orders, tickets, and purchases.

For domain-specific and user-specific applications requiring high detection accuracy, fine-tuning is necessary to completely leverage the model’s ability. This helps the model to learn from expected conversation patterns and message structures in both the inputs and outputs of protected GenAI systems.

The system operates by analyzing conversations between participants. These participants are users and AI systems, such as LLMs, agents, or contextual information sources. Furthermore, the system leverages Protegrity’s Data Discovery, if present in the same network environment, to leverage PII detection in its internal decision algorithm.

The solution provides individual message risk scores and classifications, and cumulative conversation risk scores and classifications. This dual-scoring approach ensures that while individual messages may appear benign, potentially risky cumulative conversation patterns are identified. This significantly enhances detection of sophisticated attack vectors, including LLM jailbreaks and prompt injection attempts.

Architecture

The diagram shows how client applications integrate with Semantic Guardrail, and how Data Discovery PII can be integrated as a PII detector provider.

Component	Description
External AI System	AI system, such as AI chatbot or Agent, that responds to a user, using LLM and data, which is integrated with the Semantic Guardrail solution.
External LLM	LLM employed as reasoning engine by the external AI system.
External Data Sources	Data sources used by an external AI system.
Semantic Guardrail	The core application operates as a containerized Docker service. It processes conversation data through HTTP requests and performs comprehensive security risk analysis, applying guardrails including Semantic Guardrail.
Data Discovery	For PII detection capabilities, Semantic Guardrail can leverage Protegrity’s Data Discovery solution. This solution operates as specialized Docker containers within the same environment.

AI Developer Edition API Service

Protegrity AI Developer Edition API Service features functionality derived from the original suite of Protegrity products in a form of API calls. The API endpoints are easy-to-use and require minimal configuration. Registration is required to send API requests to the service for protecting and unprotecting data. A set of predefined users and roles are provided. Based on the role used, the different scenarios can be tried and tested.

Sample Applications

Protegrity AI Developer Edition provides Python modules that showcase the features of Protegrity products.

sample-app-find module

The sample-app-find module is a Python library that process and identifies sensitive data.

The module can be customized to do the following functions:

Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.

sample-app-find-and-redact module

The sample-app-find-and-redact module is a Python library that process the identified data and redacts or masks the information.

The module can be customized to do the following functions:

Specify the items that must be identified.
Specify the operation to be performed on the data, which is redact or mask.
Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
Specify a file name and output location for the transformed data.

sample-guardrail-python module

The sample-guardrail-python module is a Python library that submits a request to Semantic Guardrail for analysis.

The module can be customized to do the following functions:

Specify the data that must be processed.
Specify the operation that must be performed, that is, semantic processor for messages and pii processor for AI.

sample-app-find-and-protect module

The sample-app-find-and-protect module is a Python library that process the identified data and protects the information. Calls are made to the AI Developer Edition API Service for performing tokenization.

The module can be customized to do the following functions:

Specify the items that must be identified.
Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
Specify a file name and output location for the transformed data.

sample-app-find-and-unprotect module

The sample-app-find-and-unprotect module is a Python library that unprotects the information protected by the sample-app-find-and-protect module. Calls are made to the AI Developer Edition API Service for performing detokenization.

The module can be customized to do the following functions:

Specify a file name and output location for the source data. Only data protected by the sample-app-find-and-protect module can be unprotected.
Specify a file name and output location for the transformed data.

sample-app-protection module

The sample-app-protection module is a Python library that protects and unprotects data. Calls are made to the AI Developer Edition API Service for performing tokenization. The Data Discovery and Semantic Guardrail containers are not required to be running for the sample-app-protection module.

The module can be customized to do the following functions:

Specify the items that must be protected, data element name, and user.
Specify the operation that must be performed, protect and unprotect.

3 - Setting up AI Developer Edition

The steps to set up the product.

Complete the prerequisites, optionally register for access to AI Developer Edition API Service, set up, verify, and run the required files for using Protegrity AI Developer Edition.

3.1 - Prerequisites

Ensure that the following prerequisites are met.

AP Python

Hardware requirements

For the local docker deployment mode, a machine with the following specifications will enable you to experiment with the main features:

RAM: 16 GB
CPU: 8 core
Hard Disk: 30GB available

For the local docker deployment mode, a machine with the following specifications will enable you to experiment with the main features:

RAM: 16 GB
CPU: 8 core
Hard Disk: 30GB available

For the local docker deployment mode, a machine with the following specifications will enable you to experiment with the main features:

RAM: 16 GB
CPU: 4 core
Hard Disk: 30GB available

Software requirements

Python v3.12.11 and above is installed. For more information about installing Python, refer to the Python website. Ensure that the Python command points to a supported python3 version, for example, Python 3.12.11. Verify using the python --version command.
pip for installing packages.
Python Virtual Environment.
Docker CLI is installed to manage Docker containers.
Docker Compose is installed for local containerized deployments. This application supports Docker Compose V2. Ensure that your installation supports this version.
Git is installed for cloning the repository.

Python v3.12.11 and above is installed. For more information about installing Python, refer to the Python website. Ensure that the Python command points to a supported python3 version, for example, Python 3.12. Verify using the python --version command.
pip for installing packages.
Python Virtual Environment.
Docker CLI is installed to manage Docker containers.
Docker Compose is installed for local containerized deployments. This application supports Docker Compose V2. Ensure that your installation supports this version.
Git is installed for cloning the repository.

Python v3.12.11 and above is installed. For more information about installing Python, refer to the Python website. Ensure that the Python command points to a supported python3 version, for example, Python 3.12. Verify using the python --version command.
pip for installing packages.
Python Virtual Environment.
Docker Desktop or Colima is installed.
Docker Compose is installed for local containerized deployments. This application supports Docker Compose V2. Ensure that your installation supports this version.
Git is installed for cloning the repository.

Additional settings for macOS

macOS requires additional steps for Docker and for systems with Apple Silicon chips. Complete the following steps before using AI Developer Edition.

Complete one of the following options to apply the settings.
- For Colima:
  1. Open a command prompt.
  2. Run the following command.
```
colima start --vm-type vz --vz-rosetta --memory 4
```
- For Docker Desktop:
  1. Open Docker Desktop.
  2. Go to Settings > General.
  3. Enable the following check boxes:
    - Use Virtualization framework
    - Use Rosetta for x86_64/amd64 emulation on Apple Silicon
  4. Click Apply & restart.
Update one of the following options for resolving certificate related errors.
- For Colima:
  1. Open a command prompt.
  2. Navigate and open the following file.
```
~/.colima/default/colima.yaml
```
  3. Update the following configuration in colima.yaml to add the path for obtaining the required images.
    Before update:
```
docker: {}
```
    After update:
```
docker:
    insecure-registries:
        - ghcr.io
```
  4. Save and close the file.
  5. Stop colima.
```
colima stop
```
  6. Close and start the command prompt.
  7. Start colima.
```
colima start --vm-type vz --vz-rosetta --memory 4
```
- For Docker Desktop:
  1. Open Docker Desktop.
  2. Click the gear or settings icon.
  3. Click Docker Engine from the sidebar. The editor opens the current Docker daemon configuration daemon.json.
  4. Locate and add the insecure-registries key in the root JSON object. Ensure that a comma is added after the last value in the existing configuration.
    After update:
```
{
    .
    .
    <existing configuration>,
    "insecure-registries": [
        "ghcr.io",
        "githubusercontent.com"
    ]
}
```
  5. Click Apply & Restart to save the changes and restart Docker Desktop.
  6. Verify: After Docker restarts, run docker info in your terminal and confirm that the required registry is listed under Insecure Registries.
Optional: If the The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested error is displayed.
1. Start a command prompt.
2. Navigate and open the following file.
```
~/.docker/config.json
```
3. Add the following paramater.
```
"default-platform": "linux/amd64"
```
4. Save and close the file.
5. Run docker compose up -d from the protegrity-developer-edition directory if already cloned, else continue with the setup.

3.2 - Optional - Obtaining access to the AI Developer Edition API Service

Creating a user account and completing the registration.

Registration is only required for running the APIs to protect, unprotect, and reprotect data. The find and redact that uses Data Discovery and Semantic Guardrail features can be used without registration. Skip this section if find and protect that uses the tokenization and encryption feature is not required.

Registering for access

Open a web browser.
Navigate to https://www.protegrity.com/developers/get-api-credentials.
Specify the following details:
- First Name
- Last Name
- Work Email
- Job Title
- Company Name
- Country
Click the Terms & Conditions link and read the terms and conditions.
Select the check box to accept the terms and conditions.
Click Get Started.

The request is analyzed. After the request is approved, a password and API key to access the AI Developer Edition API Service is sent to the Work Email specified. If the account already exists, then the details are re-sent to the email address. The email takes a minute or two to arrive. If you do not see the email in your inbox, check your spam or junk folder before retrying.

Specifying the authentication information

Add the login information provided by Protegrity to the environment to access the AI Developer Edition API Service.

It is recommended to add the details to the environment variables to avoid specifying the information every time the environment is initialized.

Open a command prompt.
Initialize a Python virtual environment.
Add the email address of the user.

export DEV_EDITION_EMAIL='<Email_used_for_registration>'

$env:DEV_EDITION_EMAIL = '<Email_used_for_registration>'

export DEV_EDITION_EMAIL='<Email_used_for_registration>'

Specify the password provided in the registration email.

export DEV_EDITION_PASSWORD='<Password_provided_in_email>'

$env:DEV_EDITION_PASSWORD = '<Password_provided_in_email>'

export DEV_EDITION_PASSWORD='<Password_provided_in_email>'

Specify the API key for accessing the AI Developer Edition API Service.

export DEV_EDITION_API_KEY='<API_key_provided_in_email>'

$env:DEV_EDITION_API_KEY = '<API_key_provided_in_email>'

export DEV_EDITION_API_KEY='<API_key_provided_in_email>'

Verify that the variables are set.

test -n "$DEV_EDITION_EMAIL" && echo "EMAIL $DEV_EDITION_EMAIL set" || echo "EMAIL missing"
test -n "$DEV_EDITION_PASSWORD" && echo "PASSWORD $DEV_EDITION_PASSWORD set" || echo "PASSWORD missing"
test -n "$DEV_EDITION_API_KEY" && echo "API KEY $DEV_EDITION_API_KEY set" || echo "API KEY missing"

if ($env:DEV_EDITION_EMAIL) { Write-Output "EMAIL $env:DEV_EDITION_EMAIL set"} else { Write-Output "EMAIL missing"} 
if ($env:DEV_EDITION_PASSWORD) { Write-Output "PASSWORD $env:DEV_EDITION_PASSWORD set" } else { Write-Output "PASSWORD missing" } 
if ($env:DEV_EDITION_API_KEY) { Write-Output "API KEY $env:DEV_EDITION_API_KEY set" } else { Write-Output "API KEY missing" }

test -n "$DEV_EDITION_EMAIL" && echo "EMAIL $DEV_EDITION_EMAIL set" || echo "EMAIL missing"
test -n "$DEV_EDITION_PASSWORD" && echo "PASSWORD $DEV_EDITION_PASSWORD set" || echo "PASSWORD missing"
test -n "$DEV_EDITION_API_KEY" && echo "API KEY $DEV_EDITION_API_KEY set" || echo "API KEY missing"

AI Developer Edition API Service usage guidelines

To ensure fair use of the API service, rate limits is enforced on API requests to the AI Developer Edition API Service.

These limits are:

Request rate: 50 per second
Burst: up to 100
Quota: 10,000 requests per user per day
Maximum payload size: 1MB

3.3 - Setting up the packages

Steps for obtaining and setting up the pacakges.

Obtaining the package

Navigate to the Protegrity AI Developer Edition repository.
Clone or download the repositories on your local system.
- protegrity-developer-edition: Contains the files to launch the required containers. It also contains the sample applications and files.
```
git clone https://github.com/Protegrity-Developer-Edition/protegrity-developer-edition.git
```
To customize the Python modules, clone and use the source from the protegrity-developer-python repository.
```
git clone https://github.com/Protegrity-Developer-Edition/protegrity-developer-python.git
```
Verify the files in the package. The list of files in the git package can be obtained from the files list.

Back up the Protegrity AI Developer Edition repository if the Python and configuration files are updated.
Navigate to the cloned repository location for protegrity-developer-edition.
Run the following command to stop the containers.
```
docker compose down
```
Based on your configuration use the docker-compose down command.
Sync to update the repositories on the local system using the git pull command.
- protegrity-developer-edition: Contains the files to launch the required containers. It also contains the sample applications and files.
- protegrity-developer-python: Contains the source files for customizing and using the Python module.
Verify the files in the package. The list of files in the git package can be obtained from the files list.

Setting up Data Discovery and Semantic Guardrail

The containers contain the Data Discovery and Semantic Guardrail components required for identifying sensitive data.

Open a command prompt.
Navigate to the cloned repository location for protegrity-developer-edition.
Run the following command to download and start the containers. The dependent containers are large in size. Based on the network connection, the containers might take time to download and deploy.
```
docker compose up -d
```
Based on your configuration use the docker-compose up -d command.
Verify that the containers started successfully.
```
docker compose logs
```

Open a command prompt.
Navigate to the cloned repository location for protegrity-developer-edition.
If the step to stop containers was missed earlier, then use the following commands to identify and remove the AI Developer Edition containers.
```
docker compose down
docker compose down --remove-orphans
```

Delete the docker network resources.

docker network rm -f <network_name_or_id>

For example,

docker network rm -f protegrity-network

Run the following command to download and start the containers. The dependent containers are large in size. Based on the network connection, the containers might take time to download and deploy.
```
docker compose up -d
```
Based on your configuration use the docker-compose up -d command.
Verify that the containers started successfully.
```
docker compose logs
```

Installing the protegrity-developer-python Module

The module has built-in functions to find, redact, mask, and protect data.

Open a command prompt.
Install the protegrity-developer-python module. It is recommended to install and activate the Python virtual environment before running this command.
```
pip install protegrity-developer-python
```
The installation completes and the success message is displayed. To compile and install the Python module from source, refer to Building the Python module.

Open a command prompt.
Upgrade the protegrity-developer-python module. It is recommended to install and activate the Python virtual environment before running the command.
```
pip install --upgrade protegrity-developer-python
```
The package is successfully upgraded.

3.4 - Verifying the files in the Protegrity AI Developer Edition package

The list of files available in the AI Developer Edition repositories.

protegrity-developer-edition repository

The repository for the obtaining and running the sample application.

docker-compose.yml: This file contains the configuration for deploying the Data Discovery and Semantic Guardrail containers.
README.md: The readme file specifying the steps to set up the product.
samples: The directory with the sample application and scripts for the Python module.
- sample-app-find-and-redact.py: The sample application Python file for detecting and redacting sensitive information in the source file.
- sample-app-find-and-protect.py: The sample application Python file for detecting and protecting sensitive information in the source file using tokenization and encryption.
- sample-app-find-and-unprotect.py: The sample application Python file for unprotecting sensitive information in the source file. The source is generated by sample-app-find-and-protect.py.
- sample-app-protection.py: The sample application Python file for protecting and unprotecting data.
- sample-app-find.py: The sample application Python file for detecting and listing sensitive information in the source file.
- config.json: The configuration file for the Python application.
- sample-data: The directory with the sample file.
  - input.txt: The sample file that is processed.
  - output-redact.txt: The output file created by the find and redact application.
  - output-protect.txt: The output file created by the find and protect application.
data-discovery: The directory with the sample application and scripts for Data Discovery.
- sample-classification-commands.sh: A file with the sample curl command for identifying sensitive data.
- sample-classification-python.py: A sample Python module for identifying sensitive data.
semantic-guardrail: The directory with the sample application and scripts for Semantic Guardrail.
- sample-guardrail-python.py: A sample Python module for submitting multi-turn conversation with semantic and PII processors.

protegrity-developer-python repository

The repository with the source files for customizing and compiling the Python file.

LICENSE: The license file with the terms and conditions for using the application.
README.md: The readme file for working with the Python file.
pyproject.toml: The configuration file for the script.
pytest.ini: The configuration file for the Pytest framework.
requirements.txt: The configuration file for the script.
appython: The directory for the source file.
- init.py: The initializing script.
- protector.py: The source file for the script.
protegrity-developer-python: The directory for the source file.
- init.py: The initializing script.
- securefind.py: The source file for the script.

4 - Running the sample application

A sample application to use AI Developer Edition.

In the AI Developer Edition, a user uploads a file using the sample application, which is processed by the Data Discovery container. The containers detect sensitive data. A Python module then redacts, masks, or protects and unprotects the data. The sanitized file is saved to a configured location. For more information about the sample application, refer to Sample application.

Use the steps provided here to run the application end-to-end. If required, run the APIs and functions provided for performing specific tasks. For more information about the identification APIs, refer to Data Discovery API.

Running the applications

Applications are provided out-of-the-box to test and understand the capabilities of AI Developer Edition.

Running the sample find application

Open a command prompt.
Navigate to the directory where AI Developer Edition is cloned.
Run the sample application using the following command.
```
python samples/sample-app-find.py
```
View the output of the files processed on the screen. The output displays a list of sensitive items in the source file.

“Sample application output”

View the processed output file in the output directory.

Running the sample find and redact application

Open a command prompt.
Navigate to the directory where AI Developer Edition is cloned.
Run the sample application using the following command.
```
python samples/sample-app-find-and-redact.py
```
View the output of the files processed on the screen. The output displays a list of sensitive items in the source file. It also displays the location and name of the output file with the redacted output.

“Sample application output”

View the processed output file in the output directory.

Running Semantic Guardrail

Open a command prompt.
Navigate to the directory where AI Developer Edition is cloned.
Run the following command to test Semantic Guardrail. The following command submits a multi-turn conversation for analysis. One for semantic and a second one for PII processing.

python semantic-guardrail/sample-guardrail-python.py

“Sample application output”

Running the sample find and protect application for Python

Ensure that the required credentials are obtained and environment variables specified, using the steps from Optional - Obtaining access to the AI Developer Edition API Service.
Open a command prompt.
Navigate to the directory where AI Developer Edition is cloned.
Run the sample application using the following command.
```
python samples/sample-app-find-and-protect.py
```
View the output of the files processed on the screen. The output displays the protected data and unprotected data.

“Sample application output”

View the processed output file in the output directory. The samples/sample-data/output-protect.txt file is generated with the protected, that is tokenized-like, values.
To obtain the original data, run the following command.
```
python samples/sample-app-find-and-unprotect.py
```
This reads the samples/sample-data/output-protect.txt file and produces the samples/sample-data/output-unprotect.txt file with original values.

“Sample application unprotect output”

Running the sample find and protect application for Java

Ensure that the required credentials are obtained and environment variables specified, using the steps from Optional - Obtaining access to the AI Developer Edition API Service.
Open a command prompt.
Navigate to the directory where AI Developer Edition is cloned.
Run the sample application using the following command.
```
java samples/sample-app-find-and-protect.py
```
View the output of the files processed on the screen. The output displays the protected data and unprotected data.

“Sample application output”

View the processed output file in the output directory. The samples/sample-data/output-protect.txt file is generated with the protected, that is tokenized-like, values.
To obtain the original data, run the following command.
```
java samples/sample-app-find-and-unprotect.py
```
This reads the samples/sample-data/output-protect.txt file and produces the samples/sample-data/output-unprotect.txt file with original values.

“Sample application unprotect output”

Running the script for protecting data

The sample-app-protection.py showcases the various scenarios for protecting, unprotecting, and reprotecting data.

Understanding Users and Roles

The users and roles are built-in for impersonate testing. Leverage any of the preconfigured users to showcase Protegrity’s Role-Based Access Controls. Using a different user will result in distinct views over sensitive data. Some users will only be able to protect data but will not be able to reverse the operation. Some users will only be able to re-identify selected attributes.

To use any of the roles, simply pass the chosen value to the payload in the user attribute during the protect or unprotect operation. If the user is not specified, the request will default to superuser.

The following roles and users have been configured and are available for use:

Role	User	Description
ADMIN	`admin`, `devops`, `jay.banerjee`	The role can protect all data but cannot unprotect. Upon an unprotection attempt they will be displayed protected values.
FINANCE	`finance`, `robin.goodwill`	The role can unprotect all PII and PCI data. The role cannot protect any data. When attempting to unprotect data without authorization, they will be displayed nulls.
MARKETING	`marketing` , `merlin.ishida`	The role can unprotect some PII data that is required for analytical research and campaign outreach. When attempting to unprotect data without authorization, they will be displayed nulls. The role cannot protect any data.
HR	`hr` , `paloma.torres`	The role can unprotect all PII data but cannot view any PCI data. When attempting to unprotect data without authorization, they will be displayed nulls. The role cannot protect any data.
OTHER	`superuser`	The role can perform any protect and unprotect operation. The role has been made available for testing only – we strongly advise against creating superuser roles in your environments.

Additionally, you may type in any user name to simulate unauthorized user behavior.

Understanding the Data Elements

A list of supported data element is provided here. For a mapping of the Data Element and the Entity Type, refer to Supported Sensitive Entity Types.

For more information about the data elements policy, refer to Data Security Policy.

Name	Description
name	Protect or unprotect name of a person
name_de	Protect or unprotect name of a person in the German language
name_fr	Protect or unprotect name of a person in the French language
address	Protect or unprotect an address
address_de	Protect or unprotect an address in the German language
address_fr	Protect or unprotect an address in the French language
city	Protect or unprotect a town or city
city_de	Protect or unprotect a town or city name in the German language
city_fr	Protect or unprotect a town or city name in the French language
postcode	Protect or unprotect a postal code with digits and characters
zipcode	Protect or unprotect a postal code with digits only
phone	Protect or unprotect a phone number
email	Protect or unprotect an email
datetime	Protect or unprotect all components of a datetime string date, month, year, and time
datetime_yc	Protect or unprotect a datetime string. Year will be in clear.
int	Protect or unprotect a 4-byte integer string
nin	Protect or unprotect a National Insurance Number UK
ssn	Protect or unprotect a Social Security Number US
ccn	Protect or unprotect a Credit Card Number
ccn_bin	Protect or unprotect a Credit Card Number. Leaves 8-digit BIN in the clear.
passport	Protect or unprotect a passport number
iban	Protect or unprotect an International Banking Account Number
iban_cc	Protect or unprotect an International Banking Account Number. Leaves letters in the clear.
string	Protect or unprotect a string
number	Protect or unprotect a number
text	Protect or unprotect text using encryption

Testing the sample file

Ensure that the required credentials are obtained and environment variables specified, using the steps from Optional - Obtaining access to the Developer Edition API Service.
Open a command prompt.
Navigate to the directory where Developer Edition is cloned.

Protect data using the following command.

python samples/sample-app-protection.py --input_data "John Smith" --policy_user superuser --data_element name --protect

View the protected output.

Unprotect the data obtained from the earlier step using the following command.

python samples/sample-app-protection.py --input_data "<protected_data>" --policy_user superuser --data_element name --unprotect

View the unprotected output.

Encrypt data using the following command.

python samples/sample-app-protection.py --input_data "John Smith" --policy_user superuser --data_element text --enc

View the encrypted output.

Decrypt the data obtained from the earlier step using the following command.

python samples/sample-app-protection.py --input_data "<encrypted_data>" --policy_user superuser --data_element text --dec

View the decrypted output.
Use the help command for more information about using the sample file.
```
python samples/sample-app-protection.py --help
```

Additional use cases for user role behavior for data protection

This section demonstrates the expected behavior of various user roles when running the sample-app-protection.py. Each section describes the permissions and restrictions for a role, followed by example commands and their outputs.

ADMIN

Users: admin, devops, jay.banerjee

This role can protect all data but cannot unprotect. When attempting to unprotect, protected values are displayed.

python sample-app-protection.py --input_data "Protegrity$" --policy_user devops --data_element name --protect --protect

python sample-app-protection.py --input_data "2839874358655598" --policy_user admin --data_element ccn --protect --protect

python sample-app-protection.py --input_data "CxWHeztVNp$" --policy_user jay.banerjee --data_element name --protect --unprotect

python sample-app-protection.py --input_data "6211214171366290" --policy_user admin --data_element ccn --protect --unprotect

“Sample application output”

FINANCE

Users: finance, robin.goodwill

This role can unprotect all PII and PCI data. The role cannot protect any data. When attempting to unprotect data without authorization, the value Null is displayed.

python sample-app-protection.py --input_data "xzrT sqdVc" --policy_user finance --data_element name --unprotect

python sample-app-protection.py --input_data "4321567898765432" --policy_user finance --data_element ccn --unprotect

python sample-app-protection.py --input_data "John Smith" --policy_user finance --data_element name --protect

python sample-app-protection.py --input_data "2839874358655598" --policy_user robin.goodwill --data_element ccn --protect

python sample-app-protection.py --input_data "1998/10/11" --policy_user finance --data_element datetime  --unprotect

python sample-app-protection.py --input_data "1998/10/11" --policy_user robin.goodwill --data_element datetime  --unprotect

“Sample application output”

MARKETING

Users: marketing, merlin.ishida

This role can unprotect some PII data that is required for analytical research and campaign outreach. The role cannot protect any data. When attempting to unprotect data without authorization, the value Null is displayed.

python sample-app-protection.py --input_data "DnZQHKcpVJ, J.G." --policy_user marketing --data_element city --unprotect

python sample-app-protection.py --input_data "4321567898765432" --policy_user merlin.ishida --data_element ccn --unprotect

python sample-app-protection.py --input_data "Washington, D.C." --policy_user marketing --data_element city --protect

python sample-app-protection.py --input_data "2839874358655598" --policy_user merlin.ishida --data_element ccn --protect

“Sample application output”

Users: hr, paloma.torres

This role can unprotect all PII data but cannot view any PCI data. The role cannot protect any data. When attempting to unprotect data without authorization, the value Null is displayed.

python sample-app-protection.py --input_data "2839874358655598" --policy_user paloma.torres --data_element ccn --unprotect

python sample-app-protection.py --input_data "CIF123654987" --policy_user hr --data_element passport --unprotect

python sample-app-protection.py --input_data "John Doe" --policy_user hr --data_element name --protect

python sample-app-protection.py --input_data "John Doe" --policy_user paloma.torres --data_element name --protect

python sample-app-protection.py --input_data "4321567898765432" --policy_user paloma.torres --data_element ccn --protect

“Sample application output”

OTHER

User: superuser

This role can perform any protect and unprotect operation. The role is only made available for testing. It is strongly advised against creating superuser roles in an environment.

python sample-app-protection.py --input_data "John Smith" --policy_user superuser --data_element name --protect --unprotect

python sample-app-protection.py --input_data "2839874358655598" --policy_user superuser --data_element ccn --protect --unprotect

“Sample application output”

4.1 - Data Discovery API

Classify API.

The Data Discovery service exposes its API on port 8580.

Data Discovery Classification Service

This API identifies, classifies, and locates sensitive data.

Endpoint

http://{Host Address}:8580/pty/data-discovery/v1.0/classify

Path

/pty/data-discovery/v1.0/classify

Method

POST

Parameters

Define the value in the score_threshold parameter to exclude results with a low score. This parameter is optional and accepts the following values:

Type: float
Values: minimum 0, maximum 1.0
Default: 0.00

For example, score_threshold = 0.75

Example Data

You can reach Dave Elliot by phone 203-555-1286.

The data should be in UTF-8 format. Also, the limit on the length of the characters is 10,000.

Sample Request

http://{Host Address}:8580/pty/data-discovery/v1.0/classify

Response Codes

Successful Response.

{
        "providers": [
          {
            "name": "Presidio Classification Provider",
            "version": "1.0.0",
            "status": 200,
            "elapsed_time": 1.014178991317749,
            "exception": null,
            "config_provider": {
              "name": "Presidio",
              "address": "http://presidio_provider_service",
              "supported_content_types": []
            }
          },
          {
            "name": "Roberta Classification Provider",
            "version": "1.0.0",
            "status": 200,
            "elapsed_time": 19.091534852981567,
            "exception": null,
            "config_provider": {
              "name": "Roberta",
              "address": "http://roberta_provider_service",
              "supported_content_types": []
            }
          }
        ],
        "classifications": {
          "PERSON": [
            {
              "score": 0.9236000061035157,
              "location": {
                "start_index": 14,
                "end_index": 25
              },
              "classifiers": [
                {
                  "provider_index": 0,
                  "name": "SpacyRecognizer",
                  "score": 0.85,
                  "details": {}
                },
                {
                  "provider_index": 1,
                  "name": "roberta",
                  "score": 0.9972000122070312,
                  "details": {}
                }
              ]
            }
          ],
          "PHONE_NUMBER": [
            {
              "score": 0.8746500015258789,
              "location": {
                "start_index": 35,
                "end_index": 47
              },
              "classifiers": [
                {
                  "provider_index": 0,
                  "name": "PhoneRecognizer",
                  "score": 0.75,
                  "details": {}
                },
                {
                  "provider_index": 1,
                  "name": "roberta",
                  "score": 0.9993000030517578,
                  "details": {}
                }
              ]
            }
          ]
        }
      }

Request must have a body, but no request body was provided.

Payload too large.

Unsupported media type.

Unexpected internal server error. Check server logs.

Internal server error. Check server logs.

Sample Request

curl -X POST "http://<SERVER_IP>:8580/pty/data-discovery/v1.0/classify?score_threshold=0.85" \
          -H "Content-Type: text/plain" \
          --data "You can reach Dave Elliot by phone 203-555-1286"

import requests
    
    url = "http://<SERVER_IP>:8580/pty/data-discovery/v1.0/classify"
    params = {"score_threshold": 0.85}
    headers = {"Content-Type": "text/plain"}
    data = "You can reach Dave Elliot by phone 203-555-1286"
    
    response = requests.post(url, params=params, headers=headers, data=data, verify=False)
    
    print("Status code:", response.status_code)
    print("Response JSON:", response.json())

URL: POST `http://<SERVER_IP>:8580/pty/data-discovery/v1.0/classify`
   Query Parameters:
   -score_threshold (optional), float between 0.0 and 1.0, default: 0.
   Headers:
   -Content-Type: text/plain
   Body:
   -You can reach Dave Elliot by phone 203-555-1286

4.2 - Application Protector Python APIs

The various APIs of the AP Python

The various APIs supported by the AP Python are described in this section. It describes the syntax of the AP Python APIs and provides the sample use cases.

Before running the APIs in this section, ensure that the required credentials are obtained and environment variables specified, using the steps from Optional - Obtaining access to the AI Developer Edition API Service.

Initialize the protector

The Protector API returns the Protector object associated with the AP Python APIs. After instantiation, this object is used to create a session. The session object provides APIs to perform the protect, unprotect, or reprotect operations.

Protector(self)

Note: Do not pass the self parameter while invoking the API.

Parameters

None

Returns

Protector: Object associated with the AP Python APIs.

Exceptions

InitializationError: This exception is thrown if the protector fails to initialize.

Example

In the following example, the AP Python is initialized.

from appython import Protector
protector = Protector()

create_session

The create_session API creates a new session. The sessions that are created using this API, automatically time out after the session timeout value has been reached. The default session timeout value is 15 minutes. However, you can also pass the session timeout value as a parameter to this API.

Note: If the session is invalid or has timed out, then the AP Python APIs that are invoked using this session object, may throw an InvalidSessionError exception. Application developers can catch the InvalidSessionError exception and create a session by again by invoking the create_session API.

def create_session(self, policy_user, timeout=15)

Note: Do not pass the self parameter while invoking the API.

Parameters

policy_user: Username defined in the policy, as a string value.
timeout: Session timeout, specified in minutes. By default, the value of this parameter is set to 15. This parameter is optional.

Returns

session: Object of the Session class. A session object is required for calling the data protection operations, such as, protect, unprotect, and reprotect.

Exceptions

ProtectorError: This exception is thrown if a null or empty value is passed as the policy_user parameter.

Example

In the following example, superuser is passed as the policy_user parameter.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")

get_version

The get_version API returns the version of the AP Python in use. Ensure that the version number of the AP Python matches with the AP Python build package.

Note: You do not need to create a session for invoking the get_version API.

def get_version(self)

Note: Do not pass the self parameter while invoking the API.

Parameters

None

Returns

String: Product version of the installed AP Python.

Exceptions

None

Example

In the following example, the current version of the installed AP Python is retrieved.

from appython import Protector
protector = Protector()
print(protector.get_version())

Result

1.0.0

protect

The protect API protects the data using tokenization, data type preserving encryption, No Encryption, or encryption data element. It supports both single and bulk protection without a maximum bulk size limit. However, you are recommended not to pass more than 1 MB of input data for each protection call.

For String and Byte data types, the maximum length for tokenization is 4096 bytes, while no maximum length is defined for encryption.

def protect(self, data, de, **kwargs)

Note: Do not pass the self parameter while invoking the API.

Parameters

data: Data to be protected. You can provide the data of any type that is supported by the AP Python. For example, you can specify data of type string, or integer. However, you cannot provide the data of multiple data types at the same time in a bulk call.
de: String containing the data element name defined in policy.
**kwargs: Specify one or more of the following keyword arguments: - external_iv: Specify the external initialization vector for Tokenization. This argument is optional.
- encrypt_to: Specify this argument for encrypting the data and set its value to bytes. This argument is Mandatory. It must not be used for Tokenization.
- charset: This is an optional argument. It indicates the byte order of the input buffer. You can specify a value for this argument from the charset constants, such as, UTF8, UTF16LE, or UTF16BE. The default value for the charset argument is UTF8. The charset argument is only applicable for the input data of byte type. The charset parameter is mandatory for the data elements created with Unicode Gen2 tokenization method for byte APIs. The encoding set for the charset parameter must match the encoding of the input data passed.

Note: Keyword arguments are case sensitive.

Returns

For single data: Returns the protected data
For bulk data: Returns a tuple of the following data:
- List or tuple of the protected data
- Tuple of error codes

Exceptions

InvalidSessionError: This exception is thrown if the session is invalid or has timed out.
ProtectError: This exception is thrown if the API is unable to protect the data.

If the protect API is used with bulk data, then it does not throw any exception. Instead, it only returns an error code.
For more information about the return codes, refer to Application Protector Return Codes.

Example - Tokenizing String Data

The examples for using the protect API for tokenizing the string data are described in this section.

Example 1: Input string data
In the following example, the Protegrity1 string is used as the data, which is tokenized using the string Alpha Numeric data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("Protegrity1", "string")
print("Protected Data: %s" %output)

Result

Protected Data: 4l0z9SQrhtk

Example 2: Input string data using session as Context Manager
In the following example, the Protegrity1 string is used as the data, which is tokenized using the string Alpha Numeric data element.

from appython import Protector
protector = Protector()
with protector.create_session("superuser") as session:
    output = session.protect("Protegrity1", "string")
    print("Protected Data: %s" %output)

Result

Protected Data: 4l0z9SQrhtk

Example 3: Input date passed as a string
In the following example, the 1998/05/29 string is used as the data, which is tokenized using the datetime Date data element.
If a date string is provided as input, then the data element with the same tokenization type as the input date format must be used to protect the data. For example, if you have provided the input date string in YYYY/MM/DD format, then you must use only the Date (YYYY/MM/DD) data element to protect the data.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("1998/05/29", "datetime")
print("Protected data: "+str(output))

Result

Protected data: 0634/01/28

Example 4: Input date and time passed as a string
In the following example, the 1998/05/29 10:54:47 string is used as the data, which is tokenized using the datetime Datetime data element.
If a date and time string is provided as input, then the data element with the same tokenization type as the input format must be used for data protection. For example, if the input date and time string in YYYY/MM/DD HH:MM:SS MMM format is provided, then only the Datetime (YYYY-MM-DD HH:MM:SS MMM) data element must be used to protect the data.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("1998/05/29 10:54:47", "datetime")
print("Protected data: "+str(output))

Result

Protected data: 0634/01/28 10:54:47

Example 5: Unicode Input passed as a String

In the following example, the ‘protegrity1234ÀÁÂÃÄÅÆÇÈÉ’ unicode data is used as the input data, which is tokenized using the string data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect('protegrity1234ÀÁÂÃÄÅÆÇÈÉ', "string")
print("Protected Data: %s" %output)

Result

Protected Data: VSYaLoLxo8GMyqÀÁÂÃÄÅÆÇÈÉ

Example - Tokenizing String Data with External Initialization Vector (IV)

The example for using using the protect API for tokenizing string data using external initialization vector (IV) is described in this section.

If you want to pass the external IV as a keyword argument to the protect API, then you must first pass the external IV as bytes to the API.

Example
In this example, the Protegrity1 string is used as the data tokenized using the string data element, with the help of the external IV 1234 passed as bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("Protegrity1", "string", 
external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %output)

Result

Protected Data: oEquECC2JYb

Example - Encrypting String Data

The example for using the protect API for encrypting the string data is described in this section.

If you want to encrypt the data, then you must use bytes in the encrypt_to keyword.

To avoid data corruption, do not convert the encrypted bytes data into the string format. It is recommended to convert the encrypted bytes data to a Hexadecimal, Base 64, or any other appropriate format.

Example
In the following example, the Protegrity1 string is used as the data, which is encrypted using the text data element (generic placeholder for an encryption-capable element). Therefore, the encrypt_to parameter is passed as a keyword argument and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("Protegrity1", "text", 
 encrypt_to=bytes)
print("Encrypted Data: %s" %output)

Result

Encrypted Data: b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V'

Example - Tokenizing Bulk String Data

The example for using the protect API for tokenizing bulk string data is described in this section. The bulk string data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "string")
print("Protected Data: ")
print(p_out)

Result

Protected Data: 
(['VSYaLoLxo8GMyq', '4l0z9SQrhtk', '9xP5wBuXJuce'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example 2: Input bulk string data
In Example 1, the protected output was a tuple of the tokenized data and the error list. This example shows how the code can be tweaked to ensure that the protected output and the error list are retrieved separately, and not as part of a tuple.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out, error_list = session.protect(data, "string")
print("Protected Data: ")
print(p_out)
print("Error List: ")
print(error_list)

Result

Protected Data: 
['VSYaLoLxo8GMyq', '4l0z9SQrhtk', '9xP5wBuXJuce']
Error List:
(6, 6, 6)

6 is the success return code for the protect operation of each element in the list.

Example 3: Input date passed as bulk strings
In the following example, the 2019/02/14 and 2018/03/11 strings are stored in a list and used as bulk data, which is tokenized using the datetime Date data element.

If a date string is provided as input, then the data element with the same tokenization type as the input date format must be used to protect the data. For example, if you have provided the input date string in YYYY/MM/DD format, then you must use only the Date (YYYY/MM/DD) data element to protect the data.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["2019/02/14", "2018/03/11"]
output = session.protect(data, "datetime")
print("Protected data: "+str(output))

Result

Protected data: (['1072/07/29', '0907/12/30'], (6, 6))

6 is the success return code for the protect operation of each element in the list.

Example 4: Input date and time passed as bulk strings
In the following example, the 2019/02/14 10:54:47 and 2019/11/03 11:01:32 strings is used as the data, which is tokenized using the datetime Datetime data element.

If a date and time string is provided as input, then the data element with the same tokenization type as the input format must be used for data protection. For example, if you have provided the input date and time string in YYYY/MM/DD HH:MM:SS MMM format, then you must use only the Datetime (YYYY-MM-DD HH:MM:SS MMM) data element to protect the data.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["2019/02/14 10:54:47", "2019/11/03 11:01:32"]
output = session.protect(data, "datetime")
print("Protected data: "+str(output))

Result

Protected data: (['1072/07/29 10:54:47', '2249/12/17 11:01:32'], (6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Encrypting Bulk String Data

The example for using the protect API for encrypting bulk string data is described in this section. The bulk string data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

Example
In the following example, protegrity1234, Protegrity1, and Protegrity56 strings are stored in a list and used as bulk data, which is encrypted using the text data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "text", encrypt_to=bytes)
print("Encrypted Data: ")
print(p_out)

Result

Encrypted Data: 
([b"I\xc1\xf0S\x0f\xaf\t\x06\xb5;\xb5'%\xab\x9b\x18", b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V', b'\xfd\x99\xa7\xd1V(\x02K\xc9\xbdZ\x97\xd6\xea\xcc\x13'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Bulk String Data with External IV

The example for using the protect API for tokenizing bulk string data using external IV is described in this section. The bulk string data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

If you want to pass the external IV as a keyword argument to the protect API, then you must pass external IV as bytes.

Example
In this example, protegrity1234, Protegrity1, and Protegrity56 strings are stored in a list and used as bulk data. This bulk data is tokenized using the string data element, with the help of external IV 123 that is passed as bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "string", 
 external_iv=bytes("123", encoding="utf-8"))
print("Protected Data: ")
print(p_out)

Result

Protected Data: 
(['qMrwdI3iiT9D14', 'JpytdIbc16c', 'fTY1RhNGRJAa'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Integer Data

The example for using the protect API for tokenizing integer data is described in this section.

Example
In the following example, 21 is used as the integer data, which is tokenized using the int data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect(21, "int")
print("Protected Data: %s" %output)

Result

Protected Data: -94623223

Example - Tokenizing Integer Data with External IV

The example for using the protect API for tokenizing integer data using the external IV is described in this section.

If you want to pass the external IV as a keyword argument to the protect API, then you must pass the external IV as bytes to the API.

Example
In this example, 21 is used as the integer data, which is tokenized using the int data element, with the help of external IV 1234 passed as bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect(21, "int", external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %output)

Result

Protected Data: 1983567415

Example - Encrypting Integer Data

The example for using the protect API for encrypting integer data is described in this section.

If you want to encrypt the data, then you must use bytes in the encrypt_to keyword.

To avoid data corruption, do not convert the encrypted bytes data into string format. It is recommended to convert the encrypted bytes data to a Hexadecimal, Base 64, or any other appropriate format.

Example
In the following example, 21 is used as the integer data, which is encrypted using the text data element. Therefore, the encrypt_to parameter is passed as a keyword argument, and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect(21, "text", encrypt_to=bytes)
print("Encrypted Data: %s" %output)

Result

Encrypted Data: b'\xf73\xb9\x7f\x94\xdf;\xbd\x02=\x877\x91]\x1b#'

Example - Tokenizing Bulk Integer Data

The example for using the protect API for tokenizing bulk integer data is described in this section. The bulk integer data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

Example
In the following example, 21, 42, and 55 integers are stored in a list and used as bulk data, which is tokenized using the int data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "int")
print("Protected Data: ")
print(p_out)

Result

Protected Data: 
([-94623223, -572010955, 2021989009], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Bulk Integer Data with External IV

The example for using the protect API for tokenizing bulk integer data using external IV is described in this section. The bulk integer data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

If you want to pass the external IV as a keyword argument to the protect API, then you must pass the external IV as bytes to the API.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "int", external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: ")
print(p_out)

Result

Protected Data: 
([1983567415, -1471024670, 1465229692], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Encrypting Bulk Integer Data

The example for using the protect API for encrypting bulk integer data is described in this section. The bulk integer data can be passed as a list or a tuple.

If you want to encrypt the data, then you must use bytes in the encrypt_to keyword.

To avoid data corruption, do not convert the encrypted bytes data into string format. It is recommended to convert the encrypted bytes data to a Hexadecimal, Base 64, or any other appropriate format.

Example
In the following example, 21, 42, and 55 integers are stored in a list and used as bulk data, which is encrypted using the text data element. Therefore, the encrypt_to parameter is passed as a keyword argument and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "text", encrypt_to=bytes)
print("Encrypted Data: ")
print(p_out)

Result

Encrypted Data: 
([b'\xf73\xb9\x7f\x94\xdf;\xbd\x02=\x877\x91]\x1b#', b'\x13\x92\xcd+\xb5\xb5\x8a\x98-$3\xa4\x00bNx', b'\xe5\xa1C\xf4HI\xe8\xe1F\x90=\xd9\xb4*pG'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Bytes Data

The example for using the protect API for tokenizing bytes data is described in this section.

Example
In the following example, “Protegrity1” string is first converted to bytes using the Python bytes() method. The bytes data is then tokenized using the string data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "string")
print("Protected Data: %s" %p_out)

Result

Protected Data: b'4l0z9SQrhtk'

Example - Tokenizing Bytes Data with External IV

The example for using the protect API for tokenizing bytes data using external IV is described in this section.

Example
In the following example, “Protegrity1” string is first converted to bytes using the Python bytes() method. The bytes data is then tokenized using the string data element, with the help of external IV 1234 that is passed as bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
output = session.protect(data, "string",
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %output)

Result

Protected Data: b'oEquECC2JYb'

Example - Encrypting Bytes Data

The example for using the protect API for encrypting bytes data is described in this section.

To avoid data corruption, do not convert the encrypted bytes data into string format. It is recommended to convert the encrypted bytes data to a Hexadecimal, Base 64, or any other appropriate format.

Example
In the following example, “Protegrity1” string is first converted to bytes using the Python bytes() method. The bytes data is then encrypted using the text data element. Therefore, the encrypt_to parameter is passed as a keyword argument and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "text", encrypt_to = bytes)
print("Encrypted Data: %s" %p_out)

Result

Encrypted Data: b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V'

Example - Tokenizing Bulk Bytes Data

The example for using the protect API for tokenizing bulk bytes data. The bulk bytes data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding="UTF-8"), bytes("Protegrity1",
 encoding="UTF-8"), bytes("Protegrity56", encoding="UTF-8")]
p_out = session.protect(data, "string")
print("Protected Data: ")
print(p_out)

Result

Protected Data: 
([b'VSYaLoLxo8GMyq', b'4l0z9SQrhtk', b'9xP5wBuXJuce'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Bulk Bytes Data with External IV

The example for using the protect API for tokenizing bulk bytes data using external IV is described in this section. The bulk bytes data can be passed as a list or a tuple.
The individual elements of the list or tuple must be of the same data type.

Example - Encrypting Bulk Bytes Data

The example for using the protect API for encrypting bulk bytes data is described in this section. The bulk bytes data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

To avoid data corruption, do not convert the encrypted bytes data into string format. It is recommended to convert the encrypted bytes data to a Hexadecimal, Base 64, or any other appropriate format.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding="UTF-8"), bytes("Protegrity1",
 encoding="UTF-8"), bytes("Protegrity56", encoding="UTF-8")]
p_out = session.protect(data, "text", encrypt_to = bytes)
print("Encrypted Data: ")
print(p_out)

Result

Encrypted Data: 
([b"I\xc1\xf0S\x0f\xaf\t\x06\xb5;\xb5'%\xab\x9b\x18", b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V', b'\xfd\x99\xa7\xd1V(\x02K\xc9\xbdZ\x97\xd6\xea\xcc\x13'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Bytes Data

The example for using the protect API for tokenizing bytes data is described in this section.

Example
In the following example, “Protegrity1” string is first converted to bytes using the Python bytes() method. The bytes data is then tokenized using the string data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "string")
print("Protected Data: %s" %p_out)

Result

Protected Data: b'4l0z9SQrhtk'

In the following example, “Protegrity1” string is first converted to bytes using the Python bytes() method. The bytes data is then tokenized using the string data element.

from appython import Protector
from appython import Charset
protector = Protector()
session = protector.create_session("superuser")
data = bytes("Protegrity1", encoding="utf-16le")
p_out = session.protect(data, "string", encrypt_to=bytes, charset=Charset.UTF16LE)
print("Protected Data: %s" %p_out)

Result

Protected Data: b'4\x00l\x000\x00z\x009\x00S\x00Q\x00r\x00h\x00t\x00k\x00'

Example - Tokenizing Bulk Bytes Data

The example for using the protect API for tokenizing bulk bytes data is described in this section. The bulk bytes data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding="UTF-8"), bytes("Protegrity1",
 encoding="UTF-8"), bytes("Protegrity56", encoding="UTF-8")]
p_out = session.protect(data, "string")
print("Protected Data: ")
print(p_out)

Result

Protected Data: 
([b'VSYaLoLxo8GMyq', b'4l0z9SQrhtk', b'9xP5wBuXJuce'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Bulk Bytes Data with External IV

The example for using the protect API for tokenizing bulk bytes data using external IV is described in this section. The bulk bytes data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding="UTF-8"), bytes("Protegrity1",
 encoding="UTF-8"), bytes("Protegrity56", encoding="UTF-8")]
p_out = session.protect(data, "string",
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: ")
print(p_out)

Result

Protected Data: 
([b'aCzyqwijkSDqiG', b'oEquECC2JYb', b't0Ly7KYx7Wyo'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Encrypting Bulk Bytes Data

The example for using the protect API for encrypting bulk bytes data is described in this section. The bulk bytes data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

To avoid data corruption, do not convert the encrypted bytes data into string format. It is recommended to convert the encrypted bytes data to a Hexadecimal, Base 64, or any other appropriate format.

Example

In the following example, protegrity1234, Protegrity1, and Protegrity56 strings are first converted to bytes using the Python bytes() method. The converted bytes are then stored in a list and used as bulk data, which is encrypted using the text data element. Therefore, the encrypt_to parameter is passed as a keyword argument and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding="UTF-8"), bytes("Protegrity1",
 encoding="UTF-8"), bytes("Protegrity56", encoding="UTF-8")]
p_out = session.protect(data, "text", encrypt_to = bytes)
print("Encrypted Data: ")
print(p_out)

Result

Encrypted Data: 
([b"I\xc1\xf0S\x0f\xaf\t\x06\xb5;\xb5'%\xab\x9b\x18", b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V', b'\xfd\x99\xa7\xd1V(\x02K\xc9\xbdZ\x97\xd6\xea\xcc\x13'], (6, 6, 6))

6 is the success return code for the protect operation of each element in the list.

Example - Tokenizing Date Objects

The examples for using the protect API for tokenizing the date objects are described in this section.

If a date string is provided as input, then the data element with the same tokenization type as the input format must be used for data protection. For example, if you have provided the input date object in YYYY/MM/DD format, then you must use only the Date (YYYY/MM/DD) data element to protect the data.

Example : Input date object in YYYY/MM/DD format
In the following example, the 1998/05/29 date string is used as the data, which is first converted to a date object using the Python date method of the datetime module.
The date object is then tokenized using the datetime data element.

from appython import Protector
from datetime import datetime
protector = Protector()
session = protector.create_session("superuser")
data = datetime.strptime("1998/05/29", "%Y/%m/%d").date()
print("\nInput date as a Date object : "+str(data))
p_out = session.protect(data, "datetime")
print("Protected date: "+str(p_out))

Result

Input date as a Date object : 1998-05-29
Protected date: 0634-01-28

Example - Tokenizing Bulk Date Objects

The example for using the protect API for tokenizing bulk date objects is described in this section. The bulk date objects can passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

If a date object is provided as input, then the data element with the same tokenization type as the input date format must be used to protect the data. For example, if you have provided the input date object in YYYY/MM/DD format, then you must use only the Date (YYYY/MM/DD) data element to protect the data.

from appython import Protector
from datetime import datetime
protector = Protector()
session = protector.create_session("superuser")
data1 = datetime.strptime("2019/02/12", "%Y/%m/%d").date()
data2 = datetime.strptime("2018/01/11", "%Y/%m/%d").date()
data = [data1, data2]
print("Input data: ", str(data))
p_out = session.protect(data, "datetime")
print("Protected data: "+str(p_out))

Result

Input data:  [datetime.date(2019, 2, 12), datetime.date(2018, 1, 11)]
Protected data: ([datetime.date(1154, 10, 29), datetime.date(1543, 1, 5)], (6, 6))

6 is the success return code for the protect operation of each element in the list.

unprotect

This function returns the data in its original form.

def unprotect(self, data, de, **kwargs)

Note: Do not pass the self parameter while invoking the API.

Parameters

data: Data to be unprotected.
de: String containing the data element name defined in policy.
**kwargs: Specify one or more of the following keyword arguments: - external_iv: Specify the external initialization vector for Tokenization. This argument is optional.
- decrypt_to: Specify this argument for decrypting the data and set its value to the data type of the original data. For example, if you are unprotecting a string data, then you must specify the output data type as str. This argument is Mandatory. This argument must not be used for Tokenization. The possible values for the decrypt_to argument are: - str - int - bytes
- charset: This is an optional argument. It indicates the byte order of the input buffer. You can specify a value for this argument from the charset constants, such as, UTF8, UTF16LE, or UTF16BE. The default value for the charset argument is UTF8. The charset argument is only applicable for the input data of byte type. The charset parameter is mandatory for the data elements created with Unicode Gen2 tokenization method for byte APIs. The encoding set for the charset parameter must match the encoding of the input data passed.

Keyword arguments are case sensitive.

Returns

For single data: Returns the unprotected data
For bulk data: Returns a tuple of the following data:
- List or tuple of the unprotected data
- Tuple of error codes

Exceptions

InvalidSessionError: This exception is thrown if the session is invalid or has timed out.
ProtectError: This exception is thrown if the API is unable to protect the data.

If the unprotect API is used with bulk data, then it does not throw any exception. Instead, it only returns an error code.
For more information about the return codes, refer to Application Protector API Return Codes.

Example - Detokenizing String Data

The examples for using the unprotect API for retrieving the original string data from the token data are described in this section.

Example 1: Input string data
In the following example, the Protegrity1 string that was tokenized using the string data element, is now detokenized using the same data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("Protegrity1", "string")
print("Protected Data: %s" %output)
org = session.unprotect(output, "string")
print("Unprotected Data: %s" %org)

Result

Protected Data: 4l0z9SQrhtk
Unprotected Data: Protegrity1

Example 2: Input date passed as a string
In the following example, the 1998/05/29 string that was tokenized using the datetime Date data element, is now detokenized using the same data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("1998/05/29", "datetime")
print("Protected data: "+str(output))
org = session.unprotect(output, "datetime")
print("Unprotected data: "+str(org))

Result

Protected data: 0634/01/28
Unprotected data: 1998/05/29

Example 3: Input date and time passed as a string
In the following example, the 1998/05/29 10:54:47 string that was tokenized using the datetime Datetime data element is now detokenized using the same data element.

If a date and time string is provided as input, then the data element with the same tokenization type as the input format must be used for data protection. For example, if the input date and time string in YYYY/MM/DD HH:MM:SS MMM format is provided, then only the Datetime (YYYY-MM-DD HH:MM:SS MMM) data element must be used to protect the data.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("1998/05/29 10:54:47", "datetime")
print("Protected data: "+str(output))
org = session.unprotect(output, "datetime")
print("Unprotected data: "+str(org))

Result

Protected data: 0634/01/28 10:54:47
Unprotected data: 1998/05/29 10:54:47

Example 4: Detokenizing Unicode Data passed as String

In the following example, the ‘protegrity1234ÀÁÂÃÄÅÆÇÈÉ’ unicode data that was tokenized using the string data element, is now detokenized using the same data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect('protegrity1234ÀÁÂÃÄÅÆÇÈÉ', "string")
print("Protected Data: %s" %output)
org = session.unprotect(output, "string")
print("Unprotected Data: %s" %org)

Result

Protected Data: VSYaLoLxo8GMyqÀÁÂÃÄÅÆÇÈÉ
Unprotected Data: protegrity1234ÀÁÂÃÄÅÆÇÈÉ

Example - Detokenizing String Data with External IV

The example for using the unprotect API for retrieving the original string data from token data, using external IV is described in this section.

If you want to pass the external IV as a keyword argument to the unprotect API, then you must pass the external IV as bytes to the API.

Example
In the following example, the Protegrity1 string that was tokenized using the string data element and the external IV 1234 is now detokenized using the same data element and external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("Protegrity1", "string", 
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %output)
org = session.unprotect(output, "string", 
 external_iv=bytes("1234", encoding="utf-8"))
print("Unprotected Data: %s" %org)

Result

Protected Data: oEquECC2JYb
Unprotected Data: Protegrity1

Example - Decrypting String Data

The example for using the unprotect API for decrypting string data is described in this section.

If you want to decrypt the data, then you must use bytes in the decrypt_to keyword.

Example
In the following example, the Protegrity1 string that was encrypted using the text data element is now decrypted using the same data element. Therefore, the decrypt_to parameter is passed as a keyword argument and its value is set to str.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("Protegrity1", "text", 
 encrypt_to=bytes)
print("Encrypted Data: %s" %output)
org = session.unprotect(output, "text", decrypt_to=str)
print("Decrypted Data: %s" %org)

Result

Encrypted Data: b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V'
Decrypted Data: Protegrity1

Example - Detokenizing Bulk String Data

The examples for using the unprotect API for retrieving the original bulk string data from the token data are described in this section.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "string")
print("Protected Data: ")
print(p_out)
out = session.unprotect(p_out[0], "string")
print("Unprotected Data: ")
print(out)

Result

Protected Data: 
(['VSYaLoLxo8GMyq', '4l0z9SQrhtk', '9xP5wBuXJuce'], (6, 6, 6))
Unprotected Data: 
(['protegrity1234', 'Protegrity1', 'Protegrity56'], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example 2: Input bulk string data
In Example 1, the unprotected output was a tuple of the detokenized data and the error list. This example shows how the code can be tweaked to ensure that the unprotected output and the error list are retrieved separately, and not as part of a tuple.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = "protegrity1234"
data = [data]*5
p_out, error_list = session.protect(data, "string")
print("Protected Data: ")
print(p_out)
print("Error List: ")
print(error_list)
org, error_list = session.unprotect(p_out, "string")
print("Unprotected Data: ")
print(org)
print("Error List: ")
print(error_list)

Result

Protected Data: 
['VSYaLoLxo8GMyq', 'VSYaLoLxo8GMyq', 'VSYaLoLxo8GMyq', 'VSYaLoLxo8GMyq', 'VSYaLoLxo8GMyq']
Error List:
(6, 6, 6, 6, 6)
Unprotected Data: 
['protegrity1234', 'protegrity1234', 'protegrity1234', 'protegrity1234', 'protegrity1234']
Error List:
(8, 8, 8, 8, 8)

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["2019/02/14", "2018/03/11"]
output = session.protect(data, "datetime")
print("Protected data: "+str(output))
org = session.unprotect(output[0], "datetime")
print("Unprotected data: "+str(org))

Result

Protected data: (['1072/07/29', '0907/12/30'], (6, 6))
Unprotected data: (['2019/02/14', '2018/03/11'], (8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["2019/02/14 10:54:47", "2019/11/03 11:01:32"]
output = session.protect(data, "datetime")
print("Protected data: "+str(output))
org = session.unprotect(output[0], "datetime")
print("Unprotected data: "+str(org))

Result

Protected data: (['1072/07/29 10:54:47', '2249/12/17 11:01:32'], (6, 6))
Unprotected data: (['2019/02/14 10:54:47', '2019/11/03 11:01:32'], (8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Detokenizing Bulk String Data with External IV

The example for using the unprotect API for retrieving the original bulk string data from token data using the external IV is described in this section.

If you want to pass the external IV as a keyword argument to the unprotect API, then you must pass the external IV as bytes to the API.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "string",
 external_iv=bytes("123", encoding="UTF-8"))
print("Protected Data: ")
print(p_out)
out = session.unprotect(p_out[0], "string",
 external_iv=bytes("123", encoding="UTF-8"))
print("Unprotected Data: ")
print(out)

Result

Protected Data: 
(['qMrwdI3iiT9D14', 'JpytdIbc16c', 'fTY1RhNGRJAa'], (6, 6, 6))
Unprotected Data: 
(['protegrity1234', 'Protegrity1', 'Protegrity56'], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Decrypting Bulk String Data

The example for using the unprotect API for decrypting bulk string data is described in this section.

If you want to decrypt the data, then you must use bytes in the decrypt_to keyword.

Example
In the following example, protegrity1234, Protegrity1, and Protegrity56 strings are stored in a list and used as bulk data, which is encrypted using the text data element. The bulk string data is then decrypted using the same data element. Therefore, the decrypt_to parameter is passed as a keyword argument and its value is set to str.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "text", encrypt_to=bytes)
print("Encrypted Data: ")
print(p_out)
out = session.unprotect(p_out[0], "text", decrypt_to=str)
print("Decrypted Data: ")
print(out)

Result

Encrypted Data: 
([b"I\xc1\xf0S\x0f\xaf\t\x06\xb5;\xb5'%\xab\x9b\x18", b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V', b'\xfd\x99\xa7\xd1V(\x02K\xc9\xbdZ\x97\xd6\xea\xcc\x13'], (6, 6, 6))
Decrypted Data: 
(['protegrity1234', 'Protegrity1', 'Protegrity56'], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Detokenizing Integer Data

The example for using the unprotect API for retrieving the original integer data from token data is described in this section.

Example
In the following example, the integer data 21 that was tokenized using the int data element, is now detokenized using the same data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect(21, "int")
print("Protected Data: %s" %output)
org = session.unprotect(output, "int")
print("Unprotected Data: %s" %org)

Result

Protected Data: -94623223
Unprotected Data: 21

Example - Detokenizing Integer Data with External IV

The example for using the unprotect API for retrieving the original integer data from token data, using external IV is described in this section.

If you want to pass the external IV as a keyword argument to the unprotect API, then you must pass the external IV as bytes to the API.

Example
In the following example, the integer data 21 that was tokenized using the int data element and the external IV 1234 is now detokenized using the same data element and external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect(21, "int", 
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %output)
org = session.unprotect(output, "int", 
 external_iv=bytes("1234", encoding="utf-8"))
print("Unprotected Data: %s" %org)

Result

Protected Data: 1983567415
Unprotected Data: 21

Example - Decrypting Integer Data

The example for using the unprotect API for decrypting integer data is described in this section.

If you want to decrypt the data, then you must use bytes in the decrypt_to keyword.

Example
In the following example, the integer data 21 that was encrypted using the text data element is now decrypted using the same data element. Therefore, the decrypt_to parameter is passed as a keyword argument and its value is set to int.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect(21, "text", encrypt_to=bytes)
print("Encrypted Data: %s" %output)
org = session.unprotect(output, "text", decrypt_to=int)
print("Decrypted Data: %s" %org)

Result

Encrypted Data: b'\xf73\xb9\x7f\x94\xdf;\xbd\x02=\x877\x91]\x1b#'
Decrypted Data: 21

Example - Detokenizing Bulk Integer Data

The example for using the unprotect API for retrieving the original bulk integer data from token data is described in this section.

The AP Python APIs support integer values only between -2147483648 and 2147483648, both inclusive.

Example
In the following example, 21, 42, and 55 integers are stored in a list and used as bulk data, which is tokenized using the int data element. The bulk integer data is then detokenized using the same data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "int")
print("Protected Data: ")
print(p_out)
out = session.unprotect(p_out[0], "int")
print("Unprotected Data: ")
print(out)

Result

Protected Data: 
([-94623223, -572010955, 2021989009], (6, 6, 6))
Unprotected Data: 
([21, 42, 55], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Detokenizing Bulk Integer Data with External IV

The example for using the unprotect API for retrieving the original bulk integer data from token data using external IV is described in this section.

If you want to pass the external IV as a keyword argument to the unprotect API, then you must pass the external IV as bytes to the API.

Example
In this example, 21, 42, and 55 integers are stored in a list and used as bulk data. This bulk data is tokenized using the int data element, with the help of external IV 1234 that is passed as bytes.The bulk integer data is then detokenized using the same data element and external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "int", external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: ")
print(p_out)
out = session.unprotect(p_out[0], "int", external_iv=bytes("1234",  encoding="utf-8"))
print("Unprotected Data: ")
print(out)

Result

Protected Data: 
([1983567415, -1471024670, 1465229692], (6, 6, 6))
Unprotected Data: 
([21, 42, 55], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Decrypting Bulk Integer Data

The example for using the unprotect API for decrypting bulk integer data is described in this section.

If you want to decrypt the data, then you must use bytes in the decrypt_to keyword.

Example
In the following example, 21, 42, and 55 integers are stored in a list and used as bulk data, which is encrypted using the text data element. The bulk integer data is then decrypted using the same data element. Therefore, the decrypt_to parameter is passed as a keyword argument and its value is set to int.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "text", encrypt_to=bytes)
print("Encrypted Data: ")
print(p_out)
out = session.unprotect(p_out[0], "text", decrypt_to=int)
print("Decrypted Data: ")
print(out)

Result

Encrypted Data: 
([b'\xf73\xb9\x7f\x94\xdf;\xbd\x02=\x877\x91]\x1b#', b'\x13\x92\xcd+\xb5\xb5\x8a\x98-$3\xa4\x00bNx', b'\xe5\xa1C\xf4HI\xe8\xe1F\x90=\xd9\xb4*pG'], (6, 6, 6))
Decrypted Data: 
([21, 42, 55], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Detokenizing Bytes Data

The example for using the unprotect API for retrieving the original bytes data from the token data is described in this section.

Example
In the following example, the bytes data ‘Protegrity1’ that was tokenized using the string data element, is now detokenized using the same data element.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "string")
print("Protected Data: %s" %p_out)
org = session.unprotect(p_out, "string")
print("Unprotected Data: %s" %org)

Result

Protected Data: b'4l0z9SQrhtk'
Unprotected Data: b'Protegrity1'

In the following example, the bytes data ‘Protegrity1’ that was tokenized using the string data element, is now detokenized using the same data element.

from appython import Protector
from appython import Charset
protector = Protector()
session = protector.create_session("superuser")
data = bytes("Protegrity1", encoding="utf-16le")
p_out = session.protect(data, "string", encrypt_to=bytes, charset=Charset.UTF16LE)
print("Protected Data: %s" %p_out)
org = session.unprotect(p_out, "string", decrypt_to=bytes, charset=Charset.UTF16LE)
print("Unprotected Data: %s" %org)

Result

Protected Data: b'4\x00l\x000\x00z\x009\x00S\x00Q\x00r\x00h\x00t\x00k\x00'
Unprotected Data: b'P\x00r\x00o\x00t\x00e\x00g\x00r\x00i\x00t\x00y\x001\x00'

Example - Detokenizing Bytes Data with External IV

The example for using the unprotect API for retrieving the original bytes data from the token data using external IV is described in this section.

Example
In this example, the bytes data ‘Protegrity1’ was tokenized using the string data element and the external IV 1234. It is now detokenized using the same data element and external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "string",
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %p_out)
org = session.unprotect(p_out, "string",
 external_iv=bytes("1234", encoding="utf-8"))
print("Unprotected Data: %s" %org)

Result

Protected Data: b'oEquECC2JYb'
Unprotected Data: b'Protegrity1'

Example - Decrypting Bytes Data

The example for using the unprotect API for decrypting bytes data is described in this section.

Example
In the following example, the bytes data b’Protegrity1’ that was encrypted using the text data element, is now decrypted using the same data element. Therefore, the decrypt_to parameter is passed as a keyword argument and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "text", encrypt_to=bytes)
print("Encrypted Data: %s" %p_out)
org = session.unprotect(p_out, "text", decrypt_to=bytes)
print("Decrypted Data: %s" %org)

Result

Encrypted Data: b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V'
Decrypted Data: b'Protegrity1'

Example - Detokenizing Bulk Bytes Data

The example for using the unprotect API for retrieving the original bulk bytes data from the token data is described in this section.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234","utf-8"), bytes("Protegrity1","utf-8"), bytes("Protegrity56","utf-8")]
p_out = session.protect(data, "string")
print("Protected Data: ")
print(p_out)
org = session.unprotect(p_out[0], "string")
print("Unprotected Data: ")
print(org)

Result

Protected Data: 
([b'VSYaLoLxo8GMyq', b'4l0z9SQrhtk', b'9xP5wBuXJuce'], (6, 6, 6))
Unprotected Data: 
([b'protegrity1234', b'Protegrity1', b'Protegrity56'], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Detokenizing Bulk Bytes Data with External IV

The example for using the unprotect API for retrieving the original bulk bytes data from the token data using external IV is described in this section.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234","utf-8"), bytes("Protegrity1","utf-8"), bytes("Protegrity56","utf-8")]
p_out = session.protect(data, "string",
 external_iv=bytes("1234","utf-8"))
print("Protected Data: ")
print(p_out) 
org = session.unprotect(p_out[0], "string",
 external_iv=bytes("1234","utf-8"))
print("Unprotected Data: ")
print(org)

Result

Protected Data: 
([b'aCzyqwijkSDqiG', b'oEquECC2JYb', b't0Ly7KYx7Wyo'], (6, 6, 6))
Unprotected Data: 
([b'protegrity1234', b'Protegrity1', b'Protegrity56'], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Decrypting Bulk Bytes Data

The example for using the unprotect API for decrypting bulk bytes data is described in this section.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding ="UTF-8"), bytes("Protegrity1", encoding
 ="UTF-8"), bytes("Protegrity56", encoding ="UTF-8")]
p_out = session.protect(data, "text", encrypt_to=bytes)
print("Encrypted Data: ")
print(p_out)
org = session.unprotect(p_out[0], "text", decrypt_to=bytes)
print("Decrypted Data: ")
print(org)

Result

Encrypted Data: 
([b"I\xc1\xf0S\x0f\xaf\t\x06\xb5;\xb5'%\xab\x9b\x18", b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V', b'\xfd\x99\xa7\xd1V(\x02K\xc9\xbdZ\x97\xd6\xea\xcc\x13'], (6, 6, 6))
Decrypted Data: 
([b'protegrity1234', b'Protegrity1', b'Protegrity56'], (8, 8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

Example - Detokenizing Date Objects

The example for using the unprotect API for retrieving the original data objects from token data is described in this section.

Example 1: Input date object in MM.DD.YYYY format

In this example, the 2019/12/02 date string is used as the data, which is first converted to a date object using the Python date method of the datetime module.
The date object is then tokenized using the datetime data element, and then detokenized using the same data element.

from appython import Protector
from datetime import datetime
protector = Protector()
session = protector.create_session("superuser")
data = datetime.strptime("2019/12/02", "%Y/%m/%d").date()
print("\nInput date as a Date object : "+str(data))
p_out = session.protect(data, "datetime")
print("Protected date: "+str(p_out))
unprotected_output = session.unprotect(p_out, "datetime")
print("Unprotected date: "+str(unprotected_output))

Result

Input date as a Date object : 2019-12-02
Protected date: 2936-03-31
Unprotected date: 2019-12-02

Example 2: Input date object in YYYY-MM-DD format

In this example, the 2019/02/12 date string is used as the data, which is first converted to a date object using the Python date method of the datetime module.
The date object is then tokenized using the datetime data element, and then detokenized using the same data element.

from appython import Protector
from datetime import datetime
protector = Protector()
session = protector.create_session("superuser")
data = datetime.strptime("2019/02/12", "%Y/%m/%d").date()
print("\nInput date as a Date object : "+str(data))
p_out = session.protect(data, "datetime")
print("Protected date: "+str(p_out))
unprotected_output = session.unprotect(p_out, "datetime")
print("Unprotected date: "+str(unprotected_output))

Result

Input date as a Date object : 2019-02-12
Protected date: 1154-10-29
Unprotected date: 2019-02-12

Example - Detokenizing Bulk Date Objects

The example for using the unprotect API for retrieving the original bulk date objects from the token data is described in this section.

Example: Input as a Date Object
In this example, the 2019/02/12 and 2018/01/11 date strings are used as the data, which are first converted to a date objects using the Python date method of the datetime module. The two date objects are then used to create a list, which is used as the input data.
The input list is then tokenized using the datetime data element, and then detokenized using the same data element.

from appython import Protector
from datetime import datetime
protector = Protector()
session = protector.create_session("superuser")
data1 = datetime.strptime("2019/02/12", "%Y/%m/%d").date()
data2 = datetime.strptime("2018/01/11", "%Y/%m/%d").date()
data = [data1, data2]
print("Input data: "+str(data))
p_out = session.protect(data, "datetime")
print("Protected data: "+str(p_out))
unprotected_output = session.unprotect(p_out[0], "datetime")
print("Unprotected date: "+str(unprotected_output))

Result

Input data: [datetime.date(2019, 2, 12), datetime.date(2018, 1, 11)]
Protected data: ([datetime.date(1154, 10, 29), datetime.date(1543, 1, 5)], (6, 6))
Unprotected date: ([datetime.date(2019, 2, 12), datetime.date(2018, 1, 11)], (8, 8))

6 is the success return code for the protect operation of each element in the list.
8 is the success return code for the unprotect operation of each element in the list.

reprotect

The reprotect API reprotects data using tokenization, data type preserving encryption, No Encryption, or encryption data element. The protected data is first unprotected and then protected again with a new data element. It supports bulk protection without a maximum data limit. However, you are recommended not to pass more than 1 MB of input data for each protection call.

For String and Byte data types, the maximum length for tokenization is 4096 bytes, while no maximum length is defined for encryption.

If you are retokenizing the data using the reprotect API, then the old data element and the new data element must have the same tokenization type. For example, if you have used Alpha-Numeric data element to protect the data, then you must use only Alpha-Numeric data element to reprotect the data.

def reprotect(self, data, old_de, new_de, **kwargs)

Note: Do not pass the self parameter while invoking the API.

Parameters

data: Protected data to be reprotected. The data is first unprotected with the old data element and then protected with the new data element.
old_de: String containing the data element name defined in the policy for the input data. This data element is used to unprotect the protected data as part of the reprotect operation.
new_de: String containing the data element name defined in the policy to create the output data. This data element is used to protect the data as part of the reprotect operation.
**kwargs: Specify one or more of the following keyword arguments: - old_external_iv: Specify the old external IV in bytes for Tokenization. This old external IV is used to unprotect the protected data as part of the reprotect operation. This argument is optional.
- new_external_iv: Specify the new external IV in bytes for Tokenization. This new external IV is used to protect the data as part of the reprotect operation. This argument is optional.
- encrypt_to: Specify this argument for re-encrypting the bytes data and set its value to bytes. This argument is Mandatory. This argument must not be used for Tokenization.
- charset: This is an optional argument. It indicates the byte order of the input buffer. You can specify a value for this argument from the charset constants, such as, UTF8, UTF16LE, or UTF16BE. The default value for the charset argument is UTF8. The charset argument is only applicable for the input data of byte type. The charset parameter is mandatory for the data elements created with Unicode Gen2 tokenization method for byte APIs. The encoding set for the charset parameter must match the encoding of the input data passed.

Keyword arguments are case sensitive.

Returns

For single data: Returns the reprotected data
For bulk data: Returns a tuple of the following data:
- List or tuple of the reprotected data
- Tuple of error codes

Exceptions

InvalidSessionError: This exception is thrown if the session is invalid or has timed out.
ProtectError: This exception is thrown if the API is unable to protect the data.

If the reprotect API is used with bulk data, then it does not throw any exception. Instead, it only returns an error code.
For more information about the return codes, refer to Application Protector API Return Codes.

Example - Retokenizing String Data

The examples for using the reprotect API for retokenizing string data are described in this section.

If you are retokenizing the data using the reprotect API, then the old data element and the new data element must have the same tokenization type. For example, if you have used the Alpha-Numeric data element to protect the data, then you must use only the Alpha-Numeric data element to reprotect the data.

Example 1: Input string data
In the following example, the Protegrity1 string is used as the input data, which is first tokenized using the string data element.
The tokenized input data, the old data element string, and a new data element string are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("Protegrity1", "string")
print("Protected Data: %s" %output)
r_out = session.reprotect(output, "string", "address")
print("Reprotected Data: %s" %r_out)

Result

Protected Data: 4l0z9SQrhtk
Reprotected Data: hFReRmrqzzB

Example 2: Input date passed as a string
In the following example, the 2019/02/14 string is used as the input data, which is first tokenized using the datetime data element.
If a date string is provided as input, then the data element with the same tokenization type as the input date format must be used to protect the data. For example, if you have provided the input date string in YYYY/MM/DD format, then you must use only the Date (YYYY/MM/DD) data element to protect the data.
The tokenized input data, the old data element datetime, and a new data element datetime are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("2019/02/14", "datetime")
print("Protected data: "+str(output))
r_out = session.reprotect(output, "datetime", "datetime_yc")
print("Reprotected data: "+str(r_out))

Result

Protected data: 1072/07/29
Reprotected data: 2019/07/13

Example 3: Input date and time passed as a string
In the following example, the 2019/02/14 10:54:47 string is used as the input data, which is first tokenized using the datetime data element.
If a date and time string is provided as input, then the data element with the same tokenization type as the input format must be used for data protection. For example, if the input date and time string in YYYY/MM/DD HH:MM:SS MMM format is provided, then only the Datetime (YYYY-MM-DD HH:MM:SS MMM) data element must be used to protect the data. The tokenized input data, the old data element datetime, and a new data element datetime are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect("2019/02/14 10:54:47", "datetime")
print("Protected data: "+str(output))
r_out = session.reprotect(output, "datetime", "datetime_yc")
print("Reprotected data: "+str(r_out))

Result

Protected data: 1072/07/29 10:54:47
Reprotected data: 2019/07/13 10:54:47

Example 4: Retokenizing Unicode Data as String

In the following example, the ‘protegrity1234ÀÁÂÃÄÅÆÇÈÉ’ unicode data is used as the input data, which is first tokenized using the string data element.
The tokenized input data, the old data element string, and a new data element string are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect('protegrity1234ÀÁÂÃÄÅÆÇÈÉ', "string")
print("Protected Data: %s" %output)
r_out = session.reprotect(output, "string", "address")
print("Reprotected Data: %s" %r_out)

Result

Protected Data: VSYaLoLxo8GMyqÀÁÂÃÄÅÆÇÈÉ
Reprotected Data: sOcSzhEwXTrclwÀÁÂÃÄÅÆÇÈÉ

Example - Retokenizing String Data with External IV

The example for using the reprotect API for retokenizing string data using external IV is described in this section.

If you want to pass the external IV as a keyword argument to the reprotect API, then you must pass the external IV as bytes to the API.

Example
In the following example, the Protegrity1 string is used as the input data, which is first tokenized using the string data element, with the help of external IV 1234 that is passed as bytes.
The tokenized input data, the string data element, the old external IV 1234 in bytes, and a new external IV 123456 in bytes are then passed as inputs to the reprotect API. As part of a single reprotect operation, the reprotect API first detokenizes the protected input data using the given data element and old external IV. It then retokenizes the data using the same data element, but with the new external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
p_out = session.protect("Protegrity1", "string", 
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %p_out)
r_out = session.reprotect(p_out, "string", 
 "string", old_external_iv=bytes("1234", encoding="utf-8"), 
 new_external_iv=bytes("123456", encoding="utf-8"))
print("Reprotected Data: %s" %r_out)

Result

Protected Data: oEquECC2JYb
Reprotected Data: m6AROToSQ71

Example - Retokenizing Bulk String Data

The examples for using the reprotect API for retokenizing bulk string data are described in this section. The bulk string data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

Example 1: Input bulk string data
In the following example, protegrity1234, Protegrity1, and Protegrity56 strings are stored in a list and used as bulk data, which is tokenized using the string data element.
The tokenized input data, the old data element string, and a new data element string are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "string")
print("Protected Data: ")
print(p_out)
r_out = session.reprotect(p_out[0], "string", "address")
print("Reprotected Data: ")
print(r_out)

Result

Protected Data: 
(['VSYaLoLxo8GMyq', '4l0z9SQrhtk', '9xP5wBuXJuce'], (6, 6, 6))
Reprotected Data: 
(['sOcSzhEwXTrclw', 'hFReRmrqzzB', 'imoJL6U4mWPk'], (50, 50, 50))

6 is the success return code for the protect operation of each element in the list.

Example 2: Input date passed as bulk strings
In the following example, the 2019/02/14 and 2018/03/11 strings are stored in a list and used as bulk data, which is tokenized using the datetime data element.

The tokenized input data, the old data element datetime, and a new data element datetime are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["2019/02/14", "2018/03/11"]
output = session.protect(data, "datetime")
print("Protected data: "+str(output))
r_out = session.reprotect(output[0], "datetime", "datetime_yc")
print("Reprotected data: "+str(r_out))

Result

Protected data: (['1072/07/29', '0907/12/30'], (6, 6))
Reprotected data: (['2019/07/13', '2018/12/14'], (50, 50))

6 is the success return code for the protect operation of each element in the list.
50 is the success return code for the reprotect operation of each element in the list.

Example 3: Input date and time passed as bulk strings
In the following example, the 2019/02/14 10:54:47 and 2019/11/03 11:01:32 strings is used as the data, which is tokenized using the datetime Datetime data element.
If a date and time string is provided as input, then the data element with the same tokenization type as the input format must be used for data protection. For example, if you have provided the input date and time string in YYYY-MM-DD HH:MM:SS MMM format, then you must use only the Datetime (YYYY-MM-DD HH:MM:SS MMM) data element to protect the data.
The tokenized input data, the old data element datetime, and a new data element datetime are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["2019/02/14 10:54:47", "2019/11/03 11:01:32"]
output = session.protect(data, "datetime")
print("Protected data: "+str(output))
r_out = session.reprotect(output[0], "datetime", "datetime_yc")
print("Reprotected data: "+str(r_out))

Result

Protected data: (['1072/07/29 10:54:47', '2249/12/17 11:01:32'], (6, 6))
Reprotected data: (['2019/07/13 10:54:47', '2019/05/29 11:01:32'], (50, 50))

6 is the success return code for the protect operation of each element in the list.

Example - Retokenizing Bulk String Data with External IV

The example for using the reprotect API for retokenizing bulk string data using external IV is described in this section. The bulk string data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

If you want to pass the external IV as a keyword argument to the reprotect API, then you must pass the external IV as bytes to the API.

Example
In the following example, protegrity1234, Protegrity1, and Protegrity56 strings are stored in a list and used as bulk data, which is tokenized using the string data element, with the help of external IV 123 that is passed as bytes.
The tokenized input data, the string data element, the old external IV 1234 in bytes, and a new external IV 123456 in bytes are then passed as inputs to the reprotect API. As part of a single reprotect operation, the reprotect API first detokenizes the protected input data using the given data element and old external IV, and then retokenizes it using the same data element, but with the new external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = ["protegrity1234", "Protegrity1", "Protegrity56"]
p_out = session.protect(data, "string",
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: ")
print(p_out)
r_out = session.reprotect(p_out[0], "string","string",
 old_external_iv=bytes("1234", encoding="utf-8"),
new_external_iv=bytes("123456", encoding="utf-8"))
print("Reprotected Data: ")
print(r_out)

Result

Protected Data: 
(['aCzyqwijkSDqiG', 'oEquECC2JYb', 't0Ly7KYx7Wyo'], (6, 6, 6))
Reprotected Data: 
(['EqDxRW2QhMqZJV', 'm6AROToSQ71', 'DTWuFfYK2ZpL'], (50, 50, 50))

6 is the success return code for the protect operation of each element in the list.

Example - Retokenizing Integer Data

The example for using the reprotect API for retokenizing integer data is described in this section.

If you are retokenizing the data using the reprotect API, then the old data element and the new data element must have the same tokenization type. For example, if you have used Integer data element to protect the data, then you must use only Integer data element to reprotect the data.

Example
In the following example, 21 is used as the input integer data, which is first tokenized using the int data element.
The tokenized input data, the old data element int, and a new data element int are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
output = session.protect(21, "int")
print("Protected Data: %s" %output)
r_out = session.reprotect(output, "int", "int")
print("Reprotected Data: %s" %r_out)

Result

Protected Data: -94623223
Reprotected Data: -94623223

Example - Retokenizing Integer Data with External IV

The example for using the reprotect API for retokenizing integer data using external IV is described in this section.

If you are retokenizing the data using the reprotect API, then the old data element and the new data element must have the same tokenization type. For example, if you have used the Integer data element to protect the data, then you must use only the Integer data element to reprotect the data.

If you want to pass the external IV as a keyword argument to the reprotect API, then you must pass the external IV as bytes to the API.

The AP Python APIs support integer values only between -2147483648 and 2147483648, both inclusive.

Example
In the following example, 21 is used as the input integer data, which is first tokenized using the int data element, with the help of external IV 1234 that is passed as bytes.
The tokenized input data, the int data element, the old external IV 1234 in bytes, and a new external IV 123456 in bytes are then passed as inputs to the reprotect API. As part of a single reprotect operation, the reprotect API first detokenizes the protected input data using the given data element and old external IV. It then retokenizes the data using the same data element, but with the new external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
p_out = session.protect(21, "int", 
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %p_out)
r_out = session.reprotect(p_out, "int", "int",
 old_external_iv=bytes("1234", encoding="utf-8"), new_external_iv=bytes("123456", encoding="utf-8"))
print("Reprotected Data: %s" %r_out)

Result

Protected Data: 1983567415
Reprotected Data: 16592685

Example - Retokenizing Bulk Integer Data

The example for using the reprotect API for retokenizing bulk integer data is described in this section. The bulk integer data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

Example
In the following example, 21, 42, and 55 integers are stored in a list and used as bulk data, which is tokenized using the int data element.
The tokenized input data, the old data element int, and a new data element int are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "int")
print("Protected Data: ")
print(p_out)
r_out = session.reprotect(p_out[0], "int", "int")
print("Reprotected Data: ")
print(r_out)

Result

Protected Data: 
([-94623223, -572010955, 2021989009], (6, 6, 6))
Reprotected Data: 
([-94623223, -572010955, 2021989009], (50, 50, 50))

6 is the success return code for the protect operation of each element in the list.

Example - Retokenizing Bulk Integer Data with External IV

The example for using the reprotect API for retokenizing bulk integer data using external IV is described in this section. The bulk integer data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

If you want to pass the external IV as a keyword argument to the reprotect API, then you must pass the external IV as bytes to the API.

Example
In the following example, 21, 42, and 55 integers are stored in a list and used as bulk data, which is tokenized using the int data element, with the help of external IV 1234 that is passed as bytes.
The tokenized input data, the int data element, the old external IV 1234 in bytes, and a new external IV 123456 in bytes are then passed as inputs to the reprotect API. As part of a single reprotect operation, the reprotect API first detokenizes the protected input data using the given data element and old external IV. It then retokenizes the data using the same data element, but with the new external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [21, 42, 55]
p_out = session.protect(data, "int", external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: ")
print(p_out)
r_out = session.reprotect(p_out[0], "int", "int",
 old_external_iv=bytes("1234", encoding="utf-8"), new_external_iv=bytes("123456", encoding="utf-8"))
print("Reprotected Data: ")
print(r_out)

Result

Protected Data: 
([1983567415, -1471024670, 1465229692], (6, 6, 6))
Reprotected Data: 
([16592685, -2026434677, 262981938], (50, 50, 50))

6 is the success return code for the protect operation of each element in the list.

Example - Retokenizing Bytes Data

The example for using the reprotect API for retokenizing bytes data is described in this section.

Example
In the following example, Protegrity1 string is first converted to bytes using the Python bytes() method. The bytes data is then tokenized using the string data element.
The tokenized input data, the old data element string, and a new data element string are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "string")
print("Protected Data: %s" %p_out)
r_out = session.reprotect(p_out, "string", "address")
print("Reprotected Data: %s" %r_out)

Result

Protected Data: b'4l0z9SQrhtk'
Reprotected Data: b'hFReRmrqzzB'

In the following example, Protegrity1 string is first converted to bytes using the Python bytes() method. The bytes data is then tokenized using the string data element.
The tokenized input data, the old data element string, and a new data element string are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
from appython import Charset
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-16be")
p_out = session.protect(data, "string", encrypt_to=bytes, charset=Charset.UTF16BE)
print("Protected Data: %s" %p_out)
r_out = session.reprotect(p_out, "string", "string", encrypt_to=bytes, charset=Charset.UTF16BE)
print("Reprotected Data: %s" %r_out)

Result

Protected Data: b'\x004\x00l\x000\x00z\x009\x00S\x00Q\x00r\x00h\x00t\x00k'
Reprotected Data: b'\x004\x00l\x000\x00z\x009\x00S\x00Q\x00r\x00h\x00t\x00k'

Example - Retokenizing Bytes Data with External IV

The example for using the reprotect API for retokenizing bytes data using external IV is described in this section.

Example
In the following example, Protegrity1 string is first converted to bytes using the Python bytes() method. The bytes data is then tokenized using the string data element, with the help of external IV 1234 that is passed as bytes.
The tokenized input data, the string data element, the old external IV 1234 in bytes, and a new external IV 123456 in bytes are then passed as inputs to the reprotect API. As part of a single reprotect operation, the reprotect API first detokenizes the protected input data using the given data element and old external IV, and then retokenizes it using the same data element, but with the new external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "string",
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: %s" %p_out)
r_out = session.reprotect(p_out, "string",
 "string", old_external_iv=bytes("1234", encoding="utf-8"),
 new_external_iv=bytes("123456", encoding="utf-8"))
print("Reprotected Data: %s" %r_out)

Result

Protected Data: b'oEquECC2JYb'
Reprotected Data: b'm6AROToSQ71'

Example - Re-Encrypting Bytes Data

The example for using the reprotect API for re-encrypting bytes data is described in this section.

If you are using the reprotect API, then the old data element and the new data element must be of the same protection method. For example, if you have used the text data element to protect the data, then you must use only the text data element to reprotect the data.

Example
In the following example, Protegrity1 string is first converted to bytes using the Python bytes() method. The bytes data is then encrypted using the text data element. Therefore, the encrypt_to parameter is passed as a keyword argument, and its value is set to bytes. The encrypted input data, the old data element text, and a new data element text are then passed as inputs to the reprotect API. The reprotect API first decrypts the protected input data using the old data element and then re-encrypts it using the new data element, as part of a single reprotect operation. Therefore, the encrypt_to parameter is passed as a keyword argument, and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data=bytes("Protegrity1", encoding="utf-8")
p_out = session.protect(data, "text", encrypt_to = bytes)
print("Encrypted Data: %s" %p_out)
r_out = session.reprotect(p_out, "text", "text", encrypt_to = bytes)
print("Re-encrypted Data: %s" %r_out)

Result

Encrypted Data: b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V'
Re-encrypted Data: b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V'

Example - Retokenizing Bulk Bytes Data

The example for using the reprotect API for retokenizing bulk bytes data is described in this section. The bulk bytes data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

Example
In the following example, protegrity1234, Protegrity1, and Protegrity56 strings are first converted to bytes using the Python bytes() method. The converted bytes are then stored in a list and used as bulk data, which is tokenized using the string data element.
The tokenized input data, the old data element string, and a new data element string are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234","utf-8"), bytes("Protegrity1","utf-8"), bytes("Protegrity56","utf-8")]
p_out = session.protect(data, "string")
print("Protected Data: ")
print(p_out)
r_out = session.reprotect(p_out[0], "string", "address")
print("Reprotected Data: ")
print(r_out)

Result

Protected Data: 
([b'VSYaLoLxo8GMyq', b'4l0z9SQrhtk', b'9xP5wBuXJuce'], (6, 6, 6))
Reprotected Data: 
([b'sOcSzhEwXTrclw', b'hFReRmrqzzB', b'imoJL6U4mWPk'], (50, 50, 50))

6 is the success return code for the protect operation of each element in the list.
50 is the success return code for the reprotect operation of each element in the list.

Example - Retokenizing Bulk Bytes Data with External IV

The example for using the reprotect API for retokenizing bulk bytes data using external IV is described in this section. The bulk bytes data can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

Example
In the following example, protegrity1234, Protegrity1, and Protegrity56 strings are first converted to bytes using the Python bytes() method. The converted bytes are then stored in a list and used as bulk data, which is tokenized using the string data element, with the help of external IV 1234 that is passed as bytes.
The tokenized input data, the string data element, the old external IV 1234 in bytes, and a new external IV 123456 in bytes are then passed as inputs to the reprotect API. As part of a single reprotect operation, the reprotect API first detokenizes the protected input data using the given data element and old external IV. It then retokenizes the data using the same data element, but with the new external IV.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding="utf-8"), bytes("Protegrity1",
 encoding="utf-8"), bytes("Protegrity56", encoding="utf-8")]
p_out = session.protect(data, "string",
 external_iv=bytes("1234", encoding="utf-8"))
print("Protected Data: ")
print(p_out) 
r_out = session.reprotect(p_out[0], "string",
 "string", old_external_iv=bytes("1234", encoding="utf-8"),
 new_external_iv=bytes("123456", encoding="utf-8"))
print("Reprotected Data: ")
print(r_out)

Result

Protected Data: 
([b'aCzyqwijkSDqiG', b'oEquECC2JYb', b't0Ly7KYx7Wyo'], (6, 6, 6))
Reprotected Data: 
([b'EqDxRW2QhMqZJV', b'm6AROToSQ71', b'DTWuFfYK2ZpL'], (50, 50, 50))

6 is the success return code for the protect operation of each element in the list.

Example - Re-Encrypting Bulk Bytes Data

The example for using the reprotect API for re-encrypting bulk bytes data is described in this section. The bulk bytes data canbe passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

To avoid data corruption, do not convert the encrypted bytes data into string format. It is recommended that you to convert the encrypted bytes data to a Hexadecimal, Base 64, or any other appropriate format.

The encrypted input data, the old data element text, and a new data element text are then passed as inputs to the reprotect API. The reprotect API first decrypts the protected input data using the old data element and then re-encrypts it using the new data element, as part of a single reprotect operation. Therefore, the encrypt_to parameter is passed as a keyword argument, and its value is set to bytes.

from appython import Protector
protector = Protector()
session = protector.create_session("superuser")
data = [bytes("protegrity1234", encoding ="UTF-8"), bytes("Protegrity1", encoding
 ="UTF-8"), bytes("Protegrity56", encoding ="UTF-8")]
p_out = session.protect(data, "text", encrypt_to = bytes)
print("Encrypted Data: ")
print(p_out)
r_out = session.reprotect(p_out[0], "text", "text", encrypt_to = bytes)
print("Re-encrypted Data: ")
print(r_out)

Result

Encrypted Data: 
([b"I\xc1\xf0S\x0f\xaf\t\x06\xb5;\xb5'%\xab\x9b\x18", b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V', b'\xfd\x99\xa7\xd1V(\x02K\xc9\xbdZ\x97\xd6\xea\xcc\x13'], (6, 6, 6))
Re-encrypted Data: 
([b"I\xc1\xf0S\x0f\xaf\t\x06\xb5;\xb5'%\xab\x9b\x18", b'\x84\x84\xaf\x10fwh\xd7w\x06)`"p\xe0V', b'\xfd\x99\xa7\xd1V(\x02K\xc9\xbdZ\x97\xd6\xea\xcc\x13'], (50, 50, 50))

Example - Retokenizing Date Objects

The example for using the reprotect API for retokenizing date objects is described in this section.

If you are retokenizing the data using the reprotect API, then the old data element and the new data element must have the same tokenization type. For example, if you have used the Date (YYYY/MM/DD) data element to protect the data, then you must use only the Date (YYYY/MM/DD) data element to reprotect the data.

Example: Input as a data object
In the following example, the 2019/02/12 date string is used as the data, which is first converted to a date object using the Python date method of the datetime module. The date object is then tokenized using the datetime data element.
The tokenized input data, the old data element datetime, and a new data element datetime are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
from datetime import datetime
protector = Protector()
session = protector.create_session("superuser")
data = datetime.strptime("2019/02/12", "%Y/%m/%d").date()
print("Input date as a Date object : "+str(data))
p_out = session.protect(data, "datetime")
print("Protected date: "+str(p_out))
r_out = session.reprotect(p_out, "datetime", "datetime_yc")
print("Reprotected date: "+str(r_out))

Result

Input date as a Date object : 2019-02-12
Protected date: 1154-10-29
Reprotected date: 2019-02-03

Example - Retokenizing Bulk Date Objects

The example for using the reprotect API for retokenizing bulk date objects is described in this section. The bulk date objects can be passed as a list or a tuple.

The individual elements of the list or tuple must be of the same data type.

Example: Input as a Date Object
In the following example, the 2019/02/12 and 2018/01/11 date strings are used as the data, which are first converted to a date objects using the Python date method of the datetime module. The two date objects are then used to create a list, which is used as the input data.
The input list is then tokenized using the datetime data element.
The tokenized input data, the old data element datetime, and a new data element datetime are then passed as inputs to the reprotect API. The reprotect API detokenizes the protected input data using the old data element and then retokenizes it using the new data element, as part of a single reprotect operation.

from appython import Protector
from datetime import datetime
protector = Protector()
session = protector.create_session("superuser")
data1 = datetime.strptime("2019/02/12", "%Y/%m/%d").date()
data2 = datetime.strptime("2018/01/11", "%Y/%m/%d").date()
data = [data1, data2]
print("Input data: ", str(data))
p_out = session.protect(data, "datetime")
print("Protected data: "+str(p_out))
r_out = session.reprotect(p_out[0], "datetime", "datetime_yc")
print("Reprotected date: "+str(r_out))

Result

Input data:  [datetime.date(2019, 2, 12), datetime.date(2018, 1, 11)]
Protected data: ([datetime.date(1154, 10, 29), datetime.date(1543, 1, 5)], (6, 6))
Reprotected date: ([datetime.date(2019, 2, 3), datetime.date(2018, 11, 14)], (50, 50))

6 is the success return code for the protect operation of each element in the list.
50 is the success return code for the reprotect operation of each element in the list.

Log return codes for Protectors

The log codes and the descriptions help you understand the reason for the code and is useful during troubleshooting.

Return Code	Description
0	Error code for no logging
1	The username could not be found in the policy
2	The data element could not be found in the policy
3	The user does not have the appropriate permissions to perform the requested operation
5	Integrity check failed
6	Data protect operation was successful
7	Data protect operation failed
8	Data unprotect operation was successful
9	Data unprotect operation failed
10	The user has appropriate permissions to perform the requested operation but no data has been protected/unprotected
11	Data unprotect operation was successful with use of an inactive keyid
12	Input is null or not within allowed limits
13	Internal error occurring in a function call after the provider has been opened
14	Failed to load data encryption key
20	Failed to allocate memory
21	Input or output buffer is too small
22	Data is too short to be protected/unprotected
23	Data is too long to be protected/unprotected
26	Unsupported algorithm or unsupported action for the specific data element
27	Application has been authorized
28	Application has not been authorized
31	Policy not available
44	The content of the input data is not valid
49	Unsupported input encoding for the specific data element
50	Data reprotect operation was successful
51	Failed to send logs, connection refused

5 - Customizing the sample application

The settings for running the sample application.

The steps mentioned in this section are optional. The sample application can run to detect and redact the data with the default configurations. These configurations are only required when a change is required in the way that the files are processed. For example, a change in the name of the input or output file.

Sample application customization

Specifying the source file

The source file contains the data that must be processed. This file can have a paragraph of text or a table with values. Protegrity AI Developer Edition can process various files. However, for security reasons, certain characters are not processed and rejected. To enable or disable these security settings, refer to the section Input Sanitization. This version of the release only supports files containing plain text.

To specify the source file:

Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the sample-app-find-and-redact.py file from the samples directory.

Locate the following statement.

input_file = base_dir / "sample-data" / "input.txt"

Update the path and name for the source file.
Save and close the file.
Run the Python file.

Specifying the output file

The output file location specifies where the processed output file must be stored.

To specify the source file:

Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the sample-app-find-and-redact.py file from the samples directory.

Locate the following statement.

output_file = base_dir / "sample-data" / "output-redact.txt"

Update the path and name for the output file.
Save and close the file.
Run the Python file.

Specifying the configuration settings

Use the config.json configuration file to specify the data that must be redacted or masked. The character that must be used for masking can also be specified.

Before you begin:

Identify the sensitive fields that are present in the source file.

Open a command prompt.
Navigate to the directory where the sample application is extracted.
Run the following command.
```
python samples/sample-app-find.py
```
View the list of sensitive items. For a complete list of items that can be identified, refer to the List of items.

Updating the configuration file.

Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the config.json file.
Specify the masking character to use in the following code.
```
"masking_char": "#"
```

Specify the text to use for the redacted data in the named_entity_map parameter. The following code shows the value used for the sample source file.

"named_entity_map": {
    "USERNAME": "USERNAME",
    "STATE": "STATE",
    "PHONE_NUMBER": "PHONE",
    "SOCIAL_SECURITY_NUMBER": "SSN",
    "AGE": "AGE",
    "CITY": "CITY",
    "PERSON": "PERSON"
}

Specify the operation to perform on the source file. The available options are mask and redact.
```
    "method": "mask"
```
Save and close the file.
Run the Python file.

Specifying the classification score threshold settings

The classification score threshold sets the minimum confidence level needed for the system to treat detected data as valid. It helps filter out uncertain matches so only high-confidence results are flagged. Adjust this threshold during setup. It is a value, such as, 0.6 for 60%. Lowering it makes the system more sensitive, while raising it reduces false positives.

To set the value:

Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the sample-app-find-and-redact.py file from the samples directory.
Locate the following statement.
```
"classification_score_threshold", 0.6
```
Set the required value.
Save and close the file.
Run the Python file.

Specifying the logging parameters

The log messages are sent to the terminal. To capture logging data, transfer and save the output of the commands to a log file.

To set the logging level:

Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the config.json file.

Locate or add the following statement.

"enable_logging": True,
"log_level": "info",

Ensure that logging is set to True and set the required log level that must be displayed.
Save and close the file.
Run the Python file.

Python module configuration

The following parameters are configurable for AI Developer Edition.

Parameter	Description	Values	Example
endpoint_url	The Data Discovery endpoint for classifying sensitive data.	Specify a URL.	http://localhost:${CLASSIFICATION_PORT:-8580}/pty/data-discovery/v1.0/classify
named_entity_map	A dictionary or map of entities and their corresponding replacement names.	List of items	named_entity_map": { “PERSON”: “PERSON”,“PHONE_NUMBER”: “PHONE”}
masking_char	The character to be used for masking.	Specify a special character.	#
classification_score_threshold	The minimum confidence level needed for the system to treat detected data as valid.	Specify a number between 0 and 1.0	0.6
method	The method for processing sensitive data.	redact or mask	mask
enable_logging	Specify whether to enable logging.	True or False	True

6 - Building the Python module

Compiling and building the Python module.

The protegrity-developer-python repository is part of the Protegrity AI Developer Edition suite. This repository provides the Python module for integrating Protegrity’s Data Discovery and Protection APIs into GenAI and traditional applications. Customize, compile, and use the module as per your requirement.

💡Note: This module should be built and used, only if you intend to change the source and default behavior. Ensure that the Protegrity AI Developer Edition is running before installing this module. For setup instructions, refer to the installation steps.

Prerequisites

Git
Python >= 3.12.11
pip
Python Virtual Environment
Uninstall the protegrity_developer_python module from the Python virtual environmentif it is already installed.
```
pip uninstall protegrity_developer_python
```

Build the protegrity-developer-python module

Clone the repository.

git clone https://github.com/Protegrity-Developer-Edition/protegrity-developer-python.git

Navigate to the protegrity-developer-python directory in the cloned location.
Optional: Update the files in the Python source directory as required.
Activate the Python virtual environment.
Install the dependencies.
```
pip install -r requirements.txt
```
Build and install the Python module by running the following command from the root directory of the repository.
```
pip install .
```
The installation completes and the success message is displayed.

7 - Appendix

This section provides supplemental details for working with the product.

7.1 - Input Sanitization

Rejecting unsanitized data.

The Classification service in Data Discovery offers a security feature that rejects unsanitized data. Data that is malformed, non-normalized, containing homoglyphs, hieroglyphs, mixed Unicode variants, or control characters are considered as unsanitized data. These are rejected for classification.

The following are few examples of data that will be rejected:

Ⅷ
𝓉𝑒𝓍𝓉
Ｐｅｐ

Before invoking the Classification endpoint, ensure that the input text is normalized. Replace invalid characters by their corresponding normalized plaintext characters. If the input text contains any invalid character, a status code of 422 and a message Untrusted input is returned.

For security purposes, the application rejects unsanitized data by default. It is recommended that this feature remains enabled. However, to override this feature, perform the following steps.

Navigate to the docker_compose directory.
Edit the docker-compose.yaml file.
Under the environment section of classification_service, append the security parameter as follows.

- SECURITY_SETTINGS={"ENABLE_ALL_SECURITY_CONTROLS":false}

Save the changes.
Run the docker compose down command to undeploy the application.
Run the docker compose up command to redeploy the application.

7.2 - Working with the Data Discovery containers

Using the Data Discovery containers.

Use Data Discovery by setting up and deploying the containers.

7.2.1 - Understanding the Docker Compose File

Details of the configurable parameters in the docker-compose.yml file.

The following variables can be configured in the docker-compose.yml file.

Variable	Description	Mandatory
networks:name	Specify the name of the Docker network.	No
services:enviroment	Specify the location for the logs in the `logging_config` parameter.	No
classification_service:ports	Specify the listening port for the classification service. By default, the port is set to 8580.	No

7.2.2 - Deploying the Application

Deploying the Data Discovery container.

Ensure that the prerequisites are completed before deploying the application.

Run the following steps to deploy the Data Discovery application on Docker.

Open a command prompt.
Navigate to the AI Developer Edition package directory.
Run the command to start the containers. For example, the following command starts the Classification service container.

docker compose up -d

7.3 - Supported Sensitive Entity Types

PII entities supported by Protegrity AI Developer Edition.

Entity Name	Data Element	Description
ABA_ROUTING_NUMBER	number	Routing number used to identify financial institutions in the United States.
ACCOUNT_NAME	string	Name associated with a financial account.
ACCOUNT_NUMBER	number	Bank account number used to identify financial accounts.
AGE	number	Age information used to identify individuals.
AMOUNT	int	Specific amount of money, which can be linked to financial transactions.
AU_ABN	number	Australian Business Number used to identify businesses in Australia.
AU_ACN	number	Australian Company Number used to identify businesses in Australia.
AU_MEDICARE	number	Medicare number used to identify individuals for healthcare services in Australia.
AU_TFN	number	Tax File Number used to identify taxpayers in Australia.
BIC	number	Bank Identifier Code used to identify financial institutions.
BITCOIN_ADDRESS	address	Bitcoin wallet address used for digital transactions.
BUILDING	address	Building information used to identify specific locations.
CITY	city	City information used to identify geographic locations.
COMPANY_NAME	string	Name of a company used to identify businesses.
COUNTRY	string	Country information used to identify geographic locations.
COUNTY	string	County information used to identify geographic locations.
CREDIT_CARD	ccn	Credit card number used for financial transactions.
CREDIT_CARD_CVV	number	Card Verification Value used to secure credit card transactions.
CRYPTO	address	Cryptocurrency wallet address used for digital transactions.
CURRENCY	string	Currency information used in financial transactions.
CURRENCY_CODE	string	Code representing currency used in financial transactions.
CURRENCY_NAME	string	Name of currency used in financial transactions.
CURRENCY_SYMBOL	string	Symbol representing currency, sometimes linked to financial transactions.
DATE	datetime	Specific date that can be linked to personal activities.
DATE_OF_BIRTH	datetime	Date of birth used to identify individuals.
DATE_TIME	datetime	Specific date and time that can be linked to personal activities.
DRIVER_LICENSE	number	Driver’s license number used to identify individuals.
EMAIL_ADDRESS	email	Email address used for communication and identification.
ES_NIE	nin	Foreigner Identification Number used to identify non-residents in Spain.
ES_NIF	nin	Tax Identification Number used to identify taxpayers in Spain.
ETHEREUM_ADDRESS	address	Ethereum wallet address used for digital transactions.
FI_PERSONAL_IDENTITY_CODE	nin	Personal identity code used to identify individuals in Finland.
GENDER	string	Gender information used to identify individuals.
GEO_CCORDINATE	address	Geographic coordinates used to identify specific locations.
IBAN_CODE	iban	International Bank Account Number used to identify bank accounts globally.
ID_CARD	number	Identity card number used to identify individuals.
IN_AADHAAR	nin	Unique identification number used to identify residents in India.
IN_PAN	number	Permanent Account Number used to identify taxpayers in India.
IN_PASSPORT	passport	Passport number used to identify individuals in India.
IN_VEHICLE_REGISTRATION	number	Vehicle registration number used to identify vehicles in India.
IN_VOTER	number	Voter ID number used to identify registered voters in India.
IP_ADDRESS	address	Internet Protocol address used to identify devices on a network.
IPV4	address	IPv4 address used to identify devices on a network.
IPV6	address	IPv6 address used to identify devices on a network.
IT_DRIVER_LICENSE	number	Driver’s license number used to identify individuals in Italy.
IT_FISCAL_CODE	nin	Fiscal code used to identify taxpayers in Italy.
IT_IDENTITY_CARD	number	Identity card number used to identify individuals in Italy.
IT_PASSPORT	passport	Passport number used to identify individuals in Italy.
LITECOIN_ADDRESS	address	Litecoin wallet address used for digital transactions.
LOCATION	address	Specific location or address that can be linked to an individual.
MAC	address	Media Access Control address used to identify devices on a network.
MEDICAL_LICENSE	number	License number used to identify medical professionals.
NRP	number	A person’s nationality, religious or political group.
ORGANIZATION	string	Name or identifier used to identify an organization.
PASSPORT	passport	Passport number used to identify individuals.
PASSWORD	string	Password used to secure access to personal accounts.
PERSON	string	Name or identifier used to identify an individual.
PHONE_NUMBER	phone	Number used to contact or identify an individual.
PIN	number	Personal Identification Number used to secure access to accounts.
PL_PESEL	nin	Personal Identification Number used to identify individuals in Poland.
SECONDARYADDRESS	address	Additional address information used to identify locations.
SG_NRIC_FIN	nin	National Registration Identity Card number used to identify residents in Singapore.
SG_UEN	number	Unique Entity Number used to identify businesses in Singapore.
SOCIAL_SECURITY_NUMBER	ssn	Social Security Number used to identify individuals.
STATE	string	State information used to identify geographic locations.
STREET	address	Street address used to identify specific locations.
TIME	datetime	Specific time that can be linked to personal activities.
TITLE	string	Title or honorific used to identify individuals.
UK_NHS	number	National Health Service number used to identify individuals for healthcare services in the United Kingdom.
URL	address	Web address that can sometimes contain personal information.
US_BANK_NUMBER	number	Bank account number used to identify financial accounts in the United States.
US_DRIVER_LICENSE	number	Driver’s license number used to identify individuals in the United States.
US_ITIN	number	Individual Taxpayer Identification Number used to identify taxpayers in the United States.
US_PASSPORT	passport	Passport number used to identify individuals in the United States.
US_SSN	ssn	Social Security Number used to identify individuals in the United States.
USERNAME	string	Username used to identify individuals in online systems.
ZIP_CODE	zipcode	Postal code used to identify specific geographic areas.

7.4 - Data Security Policy

Data Security Policy configuration.

This section describes the Policy configuration used by the AI Developer Edition API Service.

The superuser has all permissions, that is, protect, unprotect, and reprotect operations. Users assigned the admin role will receive protected data when performing an unprotect operation, except in the case of the text data elements, which will return null. All other user roles will receive null as the output for any unprotect operation.

Policy Definition

Generic Data Elements

Data Element	Method	Use Case	UTF Set	LP	PP	eIV	Role
							Admin		Finance		Marketing		HR
							P	U	P	U	P	U	P	U
datetime	Tokenization	A date or datetime string. Formats accepted: YYYY/MM/DD HH:MM:SS and YYYY/MM/DD. Delimiters accepted: /, - (required).	N/A	N/A	N/A	No	✓	Ｘ	Ｘ	Ｘ	Ｘ	✓	Ｘ	Ｘ
datetime_yc	Tokenization	A date or datetime string. Formats accepted: YYYY/MM/DD HH:MM:SS and YYYY/MM/DD. Delimiters accepted: /, - (required). Leaves the year in the clear.	N/A	N/A	N/A	No	✓	Ｘ	Ｘ	Ｘ	Ｘ	✓	Ｘ	Ｘ
int	Tokenization	An integer string (4 bytes).	Numeric	No	No	Yes	✓	Ｘ	Ｘ	Ｘ	Ｘ	✓	Ｘ	Ｘ
number	Tokenization	A numeric string. May produce leading zeroes.	Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	Ｘ	Ｘ	✓	Ｘ	Ｘ
string	Tokenization	An alphanumeric string.	Latin + Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	Ｘ	Ｘ	✓	Ｘ	Ｘ
text	Encryption	A long string (e.g., a comment field) using any character set. Use hex or base64 encoding to utilize.	All	No	No	Yes	✓	Ｘ	Ｘ	Ｘ	Ｘ	✓	Ｘ	Ｘ

PCI DSS Data Elements

Data Element	Method	Use Case	UTF Set	LP	PP	eIV	Role
							Admin		Finance		Marketing		HR
							P	U	P	U	P	U	P	U
ccn	Tokenization	Credit card numbers.	Numeric	No	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	X	Ｘ	✓
ccn_bin	Tokenization	Credit card numbers. Leaves 8-digit BIN in the clear.	Numeric	No	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	X	Ｘ	✓
iban	Tokenization	IBAN numbers. Preserves the length, case, and position of the input characters but may create invalid IBAN codes.	Latin + Numeric	Yes	Yes	No	✓	Ｘ	Ｘ	✓	Ｘ	X	Ｘ	✓
iban_cc	Tokenization	IBAN numbers. Leaves letters in the clear.	Latin + Numeric	No	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	X	Ｘ	✓

PII Data Elements

Data Element	Method	Use Case	UTF Set	LP	PP	eIV	Role
							Admin		Finance		Marketing		HR
							P	U	P	U	P	U	P	U
address	Tokenization	Street names	Latin + Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	Ｘ	Ｘ	✓
city	Tokenization	Town or city name	Latin	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓
email	Tokenization	Email address. Leaves the domain in the clear.	Latin + Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓
nin	Tokenization	National Insurance Number. Preserves the length, case, and position of the input characters but may create invalid NIN codes.	Latin + Numeric	Yes	Yes	No	✓	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ
name	Tokenization	Person's name	Latin	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓
passport	Tokenization	Passport codes. Preserves the length, case, and position of the input characters but may create invalid passport numbers.	Latin + Numeric	Yes	Yes	No	✓	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ
phone	Tokenization	Phone number. May produce leading zeroes.	Latin + Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ
postcode	Tokenization	Postal codes with digits and characters. Preserves the length, case, and position of the input characters but may create invalid post codes.	Latin + numeric	Yes	Yes	No	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓
ssn	Tokenization	Social Security Number (US)	Latin + Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ	Ｘ
zipcode	Tokenization	Zip codes with digits only. May produce leading zeroes.	Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓

PII Data Elements

Data Element	Method	Use Case	UTF Set	LP	PP	eIV	Role
							Admin		Finance		Marketing		HR
							P	U	P	U	P	U	P	U
address_de	Tokenization	Street names (German)	Latin + German + Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	Ｘ	Ｘ	✓
address_fr	Tokenization	Street names (French)	Latin + French + Numeric	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	Ｘ	Ｘ	✓
city_de	Tokenization	Town or city name (German)	Latin + German	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓
city_fr	Tokenization	Town or city name (French)	Latin + French	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓
name_de	Tokenization	Person's name (German)	Latin + German	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓
name_fr	Tokenization	Person's name (French)	Latin + French	Yes	No	Yes	✓	Ｘ	Ｘ	✓	Ｘ	✓	Ｘ	✓

LEGEND

eIV: External IV
LP: Length Preservation
PP: Position Preservation
P: User group can protect data
U: User group can unprotect data

7.5 - Removing AI Developer Edition

Steps for removing the product.

Open a command prompt.
Navigate to the cloned repository location.
Run the following command to remove the containers and images.
```
docker compose down --rmi all
```
Run the following command to remove the Python module.
```
pip uninstall protegrity-developer-python
```

7.6 - Known Issues

Issues and workaround information.

Issue: SSL errors in the Data Discovery container

Description: The tldextract tries to download the following public Suffix lists files:

When these lists cannot be downloaded, then the default files included in the package are used and no issue in observed in the classification.