This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Setting up AI Developer Edition

The steps to set up the product.

Complete the prerequisites, optionally register for access to AI Developer Edition API Service, set up, verify, and run the required files for using Protegrity AI Developer Edition.

1 - Prerequisites

The prerequisites for setting up AI Developer Edition.

General requirements

The system requirements for the AI Developer edition are provided here.

Hardware requirements

For the local docker deployment mode, a machine with the following specifications will enable you to experiment with the main features:

  • RAM: 16 GB
  • CPU: 8 core
  • GPU: 4 GB VRAM, for Synthetic Data only
  • Hard Disk: 50GB available

For the local docker deployment mode, a machine with the following specifications will enable you to experiment with the main features:

  • RAM: 16 GB
  • CPU: 8 core
  • GPU: 4 GB VRAM, for Synthetic Data only
  • Hard Disk: 50GB available

For the local docker deployment mode, a machine with the following specifications will enable you to experiment with the main features:

  • RAM: 16 GB
  • CPU: 8 core
  • GPU: 4 GB VRAM, for Synthetic Data only
  • Hard Disk: 50GB available

Software requirements

  • Python v3.12.11 and above is installed. For more information about installing Python, refer to the Python website. Ensure that the Python command points to a supported python3 version, for example, Python 3.12.11. Verify using the python --version command.
  • pip for installing packages.
  • Python Virtual Environment.
  • Docker CLI is installed to manage Docker containers.
  • Docker Compose is installed for local containerized deployments. This application supports Docker Compose V2.30 and later. Ensure that your installation supports this version.
  • Git is installed for cloning the repository.
  • Java 11 or later.
  • Maven 3.6+ for AP Java.
  • Python v3.12.11 and above is installed. For more information about installing Python, refer to the Python website. Ensure that the Python command points to a supported python3 version, for example, Python 3.12. Verify using the python --version command.
  • pip for installing packages.
  • Python Virtual Environment.
  • Docker CLI is installed to manage Docker containers.
  • Docker Compose is installed for local containerized deployments. This application supports Docker Compose V2.30 and later. Ensure that your installation supports this version.
  • Git is installed for cloning the repository.
  • Java 11 or later.
  • Maven 3.6+ for AP Java.
  • Python v3.12.11 and above is installed. For more information about installing Python, refer to the Python website. Ensure that the Python command points to a supported python3 version, for example, Python 3.12. Verify using the python --version command.
  • pip for installing packages.
  • Python Virtual Environment.
  • Docker Desktop or Colima is installed.
  • Docker Compose is installed for local containerized deployments. This application supports Docker Compose V2.30 and later. Ensure that your installation supports this version.
  • Git is installed for cloning the repository.
  • Java 11 or later.
  • Maven 3.6+ for AP Java.

Additional settings for macOS

macOS requires additional steps for Docker and for systems with Apple Silicon chips. Complete the following steps before using AI Developer Edition.

  1. Complete one of the following options to apply the settings.

    • For Colima:
      1. Open a command prompt.
      2. Run the following command.
        colima start --vm-type vz --vz-rosetta --memory 8
        
    • For Docker Desktop:
      1. Open Docker Desktop.
      2. Go to Settings > General.
      3. Enable the following check boxes:
        • Use Virtualization framework
        • Use Rosetta for x86_64/amd64 emulation on Apple Silicon
      4. Click Apply & restart.
  2. Update one of the following options for resolving certificate related errors.

    • For Colima:
      1. Open a command prompt.

      2. Navigate and open the following file.

        ~/.colima/default/colima.yaml
        
      3. Update the following configuration in colima.yaml to add the path for obtaining the required images.

        Before update:

        docker: {}
        

        After update:

        docker:
            insecure-registries:
                - ghcr.io
        
      4. Save and close the file.

      5. Stop colima.

        colima stop
        
      6. Close and start the command prompt.

      7. Start colima.

        colima start --vm-type vz --vz-rosetta --memory 8
        
    • For Docker Desktop:
      1. Open Docker Desktop.

      2. Click the gear or settings icon.

      3. Click Docker Engine from the sidebar. The editor opens the current Docker daemon configuration daemon.json.

      4. Locate and add the insecure-registries key in the root JSON object. Ensure that a comma is added after the last value in the existing configuration.

        After update:

        {
            .
            .
            <existing configuration>,
            "insecure-registries": [
                "ghcr.io",
                "githubusercontent.com"
            ]
        }
        
      5. Click Apply & Restart to save the changes and restart Docker Desktop.

      6. Verify: After Docker restarts, run docker info in your terminal and confirm that the required registry is listed under Insecure Registries.

  3. Optional: If the The requested image’s platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested error is displayed.

    1. Start a command prompt.

    2. Navigate and open the following file.

      ~/.docker/config.json
      
    3. Add the following paramater.

      "default-platform": "linux/amd64"
      
    4. Save and close the file.

    5. Run docker compose up -d from the protegrity-developer-edition directory if already cloned, else continue with the setup.

2 - Optional - Obtaining access to the AI Developer Edition API Service

Creating a user account and completing the registration.

Registration is only required for running the APIs to protect, unprotect, and reprotect data. The find and redact that uses Data Discovery, Semantic Guardrail, and Synthetic Data features can be used without registration. Skip this section if find and protect that uses the tokenization and encryption feature is not required.

Registering for access

Sign up for access to the AI Developer Edition API Service. This is required for obtaining access to use the APIs.

  1. Open a web browser.
  2. Navigate to https://www.protegrity.com/developers/dev-edition-api.
  3. Specify the following details:
    • First Name
    • Last Name
    • Work Email
    • Job Title
    • Company Name
    • Country
  4. Click the Terms & Conditions link and read the terms and conditions.
  5. Select the check box to accept the terms and conditions.
  6. Click Get Started.

The request is analyzed. After the request is approved, a password and API key to access the AI Developer Edition API Service is sent to the Work Email specified. If the account already exists, then the details are re-sent to the email address. The email takes a minute or two to arrive. If the email does not arrive in the specified email’s inbox, check the spam or junk folder first, before retrying.

Use the online Protegrity notebook with the credentials to test tokenization.

Specifying the authentication information

Add the login information provided by Protegrity to the environment to access the AI Developer Edition API Service.

Note: It is recommended to add the details to the environment variables to avoid specifying the information every time the environment is initialized.

  1. Open a command prompt.
  2. Initialize a Python virtual environment.
  3. Add the email address of the user.
export DEV_EDITION_EMAIL='<Email_used_for_registration>'
$env:DEV_EDITION_EMAIL = '<Email_used_for_registration>' 
export DEV_EDITION_EMAIL='<Email_used_for_registration>'
  1. Specify the password provided in the registration email.
export DEV_EDITION_PASSWORD='<Password_provided_in_email>'
$env:DEV_EDITION_PASSWORD = '<Password_provided_in_email>' 
export DEV_EDITION_PASSWORD='<Password_provided_in_email>'
  1. Specify the API key for accessing the AI Developer Edition API Service.
export DEV_EDITION_API_KEY='<API_key_provided_in_email>'
$env:DEV_EDITION_API_KEY = '<API_key_provided_in_email>'  
export DEV_EDITION_API_KEY='<API_key_provided_in_email>'
  1. Verify that the variables are set.
test -n "$DEV_EDITION_EMAIL" && echo "EMAIL $DEV_EDITION_EMAIL set" || echo "EMAIL missing"
test -n "$DEV_EDITION_PASSWORD" && echo "PASSWORD $DEV_EDITION_PASSWORD set" || echo "PASSWORD missing"
test -n "$DEV_EDITION_API_KEY" && echo "API KEY $DEV_EDITION_API_KEY set" || echo "API KEY missing"
if ($env:DEV_EDITION_EMAIL) { Write-Output "EMAIL $env:DEV_EDITION_EMAIL set"} else { Write-Output "EMAIL missing"} 
if ($env:DEV_EDITION_PASSWORD) { Write-Output "PASSWORD $env:DEV_EDITION_PASSWORD set" } else { Write-Output "PASSWORD missing" } 
if ($env:DEV_EDITION_API_KEY) { Write-Output "API KEY $env:DEV_EDITION_API_KEY set" } else { Write-Output "API KEY missing" } 
test -n "$DEV_EDITION_EMAIL" && echo "EMAIL $DEV_EDITION_EMAIL set" || echo "EMAIL missing"
test -n "$DEV_EDITION_PASSWORD" && echo "PASSWORD $DEV_EDITION_PASSWORD set" || echo "PASSWORD missing"
test -n "$DEV_EDITION_API_KEY" && echo "API KEY $DEV_EDITION_API_KEY set" || echo "API KEY missing"

AI Developer Edition API Service usage guidelines

To ensure fair use of the API service, a rate limit is enforced on API requests to the AI Developer Edition API Service.

These limits are:

  • Request rate: 50 per second
  • Burst: up to 100
  • Quota: 10,000 requests per user per day
  • Maximum payload size: 1MB

3 - Setting up the packages

Steps for obtaining and setting up the packages.

Obtaining the package

  1. Navigate to the Protegrity AI Developer Edition repository.

  2. Clone or download the repositories on your local system.

    git clone https://github.com/Protegrity-Developer-Edition/protegrity-developer-edition.git
    

    To customize the Python modules, clone and use the source from the protegrity-developer-python repository.

    git clone https://github.com/Protegrity-Developer-Edition/protegrity-developer-python.git
    

    To customize the Java libraries, clone and use the source from the protegrity-developer-java repository.

    git clone https://github.com/Protegrity-Developer-Edition/protegrity-developer-java.git
    
  3. Verify the files in the package. The list of files in the git package can be obtained from the files list.

  1. Back up the Protegrity AI Developer Edition repository if the Python and configuration files are updated.

    Note: The supported entites are updated. For more information about the entites, refer to Supported Entites.

  2. Navigate to the cloned repository location for protegrity-developer-edition.

  3. Run the following command to stop the containers.

    docker compose down
    

    Based on your configuration use the docker-compose down command.

  4. Sync to update the repositories on the local system using the git pull command.

  5. Verify the files in the package. The list of files in the git package can be obtained from the files list.

Setting up Data Discovery, Semantic Guardrail, and Synthetic Data

The containers contain the Data Discovery and Semantic Guardrails components required for identifying sensitive data. It also contains the Synthetic Data component for data generation.

  1. Open a command prompt.

  2. Navigate to the cloned repository location for protegrity-developer-edition.

  3. Run the following command to download and start the containers. The dependent containers are large in size. Based on the network connection, the containers might take time to download and deploy.

    To start all the features.

    docker compose --profile synthetic up -d
    

    To start only the Data Discovery and Semantic Guardrails features.

    docker compose up -d
    

    Based on your configuration use the docker-compose up -d command. Ensure that you bring down the containers using docker compose --profile synthetic down or docker compose down before switching between starting all containers or Data Discovery and Semantic Guardrails containers.

  4. Verify that the containers started successfully.

    docker compose logs
    
  5. Set up the Jupyter notebook for working with the notebooks provided from the cloned repository location for protegrity-developer-edition.

    pip install -r samples/python/requirements.txt
    
  1. Open a command prompt.

  2. Navigate to the cloned repository location for protegrity-developer-edition.

  3. If the step to stop containers was missed earlier, then use the following commands to identify and remove the AI Developer Edition containers.

    docker compose down --remove-orphans
    
  4. Delete the docker network resources.

    docker network rm -f <network_name_or_id>
    

    For example,

    docker network rm -f protegrity-network
    
  5. Run the following command to download and start the containers. The dependent containers are large in size. Based on the network connection, the containers might take time to download and deploy.

    To start all the features.

    docker compose --profile synthetic up -d
    

    To start only the Data Discovery and Semantic Guardrails features.

    docker compose up -d
    

    Based on your configuration use the docker-compose up -d command. Ensure that you bring down the containers using docker compose --profile synthetic down or docker compose down before switching between starting all containers or Data Discovery and Semantic Guardrails containers.

  6. Verify that the containers started successfully.

    docker compose logs
    
  7. Set up the Jupyter notebook for working with the notebooks provided from the cloned repository location for protegrity-developer-edition.

    pip install -r samples/python/requirements.txt
    

Installing the protegrity-developer-python module

The module has built-in functions to find, redact, mask, and protect data.

  1. Open a command prompt.

  2. Install the protegrity-developer-python module. It is recommended to install and activate the Python virtual environment before running this command.

    pip install protegrity-developer-python
    

    The installation completes and the success message is displayed. To compile and install the Python module from source, refer to Building the Python module.

  1. Open a command prompt.

  2. Upgrade the protegrity-developer-python module. It is recommended to install and activate the Python virtual environment before running the command.

    pip install --upgrade protegrity-developer-python
    

    The package is successfully upgraded.

Installing the protegrity-developer-java library

When you run the Java samples for the first time, Maven automatically pulls the protegrity-developer-java library from Maven Central as a dependency. This ensures that all required classes and resources are available without manual download.

4 - Verifying the files in the Protegrity AI Developer Edition package

The list of files available in the AI Developer Edition repositories.

protegrity-developer-edition repository

This repository contains the files for obtaining and running the sample application.

  • CHANGELOG.md: It tracks version updates and changes are tracked here.
  • CONTRIBUTIONS.md: The guidelines for contributing to the project.
  • LICENSE: The license file with the terms and conditions for using the application.
  • README.md: The readme file specifying the steps to set up the product.
  • docker-compose.yml: This file contains the configuration for deploying the containers.
  • data-discovery: The directory with the sample application and scripts for Data Discovery.
  • semantic-guardrail: The directory with the sample scripts for Semantic Guardrail.
  • samples: The directory with the sample application and scripts for the testing the features.
    • java: The directory with the sample scripts for testing the features using Java.
    • python: The directory with the sample scripts for testing the features using Python.
      • sample-app-semantic-guardrails: The directory with the sample notebook for testing the Semantic Guardrails feature.
      • sample-app-synthetic-data: The directory with the sample notebook and datastore for testing the Synthetic Data feature.
    • sample-data: The directory with the input file for the scripts.
    • config.json: The configuration file for working with the scripts.

protegrity-developer-python repository

This repository contains the source files for customizing and building the Python module.

  • LICENSE: The license file with the terms and conditions for using the application.
  • README.md: The readme file for working with the Python file.
  • pyproject.toml: The configuration file for the script.
  • pytest.ini: The configuration file for the Pytest framework.
  • requirements.txt: The configuration file for the script.
  • setup.cfg: The additional settings for packaging and tools.
  • src: The core implementation of the Python module.
  • tests: The unit and integration tests to ensure that the Python module works as expected.
  • .gitignore: The files and directories to ignore in version control.
  • .pylintrc: The linting rules for code quality are defined here.
  • CHANGELOG.md: It tracks version updates and changes are tracked here.
  • CONTRIBUTIONS.md: The guidelines for contributing to the project.

protegrity-developer-java repository

This repository contains the source files for customizing and building the Java library.

  • LICENSE: The license file with the terms and conditions for using the application.
  • README.md: The readme file for working with the Java library.
  • pom.xml: The Maven Project Object Model file for building the Java project.
  • .gitignore: The files and directories to ignore in version control.
  • run-integration-tests.sh: The shell script to execute integration tests easily.
  • mvnw and mvnw.cmd: The Maven Wrapper scripts for Linux, Mac, and Windows.
  • protegrity-developer-edition: The additional modules or extensions for the Developer Edition.
  • integration-tests: The integration tests for validating the Java library functionality.
  • application-protector-java: The Java library implementation for the Application Protector service. It includes source code and configuration files.
  • .mvn/wrapper: The Maven Wrapper configuration files.