Introduction to Protegrity AI Developer Edition

An overview of the product.

1: Release Highlights

Protegrity AI Developer Edition is a lightweight, containerized sandbox. It helps developers, data scientists, and architects to quickly explore and integrate prototype data protection and discovery workflows. It does not require setting up a complex infrastructure and managing its operational overhead.

It is a self-contained, Docker-based environment designed to enable a user to have a hands-on experimentation without the need for enterprise infrastructure. With modular architecture, built-in sample data, and a developer-first experience, AI Developer Edition is ideal for evaluating Protegrity’s capabilities in a fast, flexible, and frictionless way.

What is Protegrity AI Developer Edition?

Protegrity AI Developer Edition is designed to help a developer move quickly from idea to implementation, using familiar tools, sample apps, and open APIs.

It provides a streamlined environment to:

Discover and redact sensitive data using APIs and sample apps.
Discover and protect or unprotect sensitive data using APIs and sample apps.
Experience tokenization using the Protegrity Data Protection Jupyter notebook.
Perform message and conversation level risk scoring.
Scan Personally Identifiable Information (PII) for GenAI flows.
Provide a streamlined environment to test real-world use cases with sample datasets and guided walkthroughs.
Generate synthetic data.

AI Developer Edition runs entirely on Docker, making it easy to spin up, tear down, and iterate quickly. It helps the user build a proof of concept, validate integration points, and get familiar with Protegrity’s core concepts. This edition provides the tools to set up the product fast and independently.

Note: This product is not meant for production use, but it is the perfect launchpad for innovation.

Key Features and Benefits for AI Developers

AI Developer Edition is purpose-built for fast, frictionless exploration of Protegrity’s core capabilities.

The following features make it ideal for prototyping and integration:

Platform Capabilities

AI Developer Edition provides a comprehensive set of platform capabilities that simplify how developers integrate data protection into their workflows. From containerized deployment to cross-language SDK support, each component is designed for rapid setup, minimal configuration, and seamless iteration.

Modular, Containerized Architecture: AI Developer Edition runs on Docker, making it easy to test, isolate, and iterate.
Lightweight: No orchestration overhead. Just deploy the container and use the sample application.
Python Module: An open-source Python module providing APIs to protect, unprotect, and reprotect sensitive data in Python-based applications. It is available through PyPI for easy installation.
Java Library: An open-source Java library providing APIs to protect, unprotect, and reprotect sensitive data in Java-based applications. It is distributed using Maven Central for easy integration.
AI Developer Edition API Service: A service hosted by Protegrity that allows developers to interact with Protegrity’s protection and discovery services through intuitive endpoints. It supports protection and unprotection of sensitive data, enabling rapid prototyping and testing of data protection scenarios without needing full-scale infrastructure. Registration is required for this service. The credentials can be obtained for free.
Sample Apps and Data: Jumpstart evaluation with ready-to-run sample apps that demonstrate real-world use cases. These use cases include finding sensitive data in unstructured text, finding and redacting, finding and protecting or unprotecting sensitive data, multi-turn conversations, and agent coordination patterns. Adjust behavior through shared/config.json.
Cross-platform: Works on Linux, macOS, and Windows.

Core Data Protection Services

Protegrity AI Developer Edition offers features that help build AI services. These features range from identifying and protecting sensitive information to generating safe synthetic alternatives.

Data Discovery: Identifies and classifies sensitive data using built-in and custom classifiers with confidence scoring. Discovers and redacts sensitive data in datasets for use in training GenAI models or sharing with third parties.
Semantic Guardrails: A security guardrail engine for AI systems. Evaluates risks in GenAI systems such as chatbots, workflows, and agents through advanced semantic analytics and intent classification to detect potentially malicious messages. Provides message and conversation level risk scoring and PII scanning to prevent context poisoning and enforce governance in multi-agent systems.
Synthetic Data: Analyzes a data set and generates data that mimics the properties of real data, such as data types, ranges, correlations, and distributions, without containing any actual personal information. Enables safe model training and end-to-end agent workflow testing.
Anonymization: Replaces sensitive data with anonymized values to protect privacy while maintaining the utility of the data for analysis and model training.

Secure Data and AI Pipelines

AI Developer Edition enables end-to-end privacy across the AI lifecycle from data ingestion and model training to inference and output delivery. This ensures that sensitive information is protected at every stage of the pipeline.

Privacy in conversational AI: Sensitive chatbot inputs are protected before they reach generative AI models.
Prompt sanitization for LLMs: Automated PII masking reduces risk during large language model prompt engineering and inference.
Experimentation with Jupyter notebooks: Data scientists can prototype directly in Jupyter notebooks for agile experimentation.
Output redaction and leakage prevention: Detect and protect sensitive data in model outputs before returning them to end users.
Privacy-enhanced AI training: Sensitive fields in training datasets are de-identified to support compliant and secure AI development.

Note: This product is continuously improving. The features and capabilities mentioned here are either already available or will be available shortly.

Protegrity AI Developer Edition Personas

AI Developer Edition targets developers building AI-powered systems in regulated industries. These industries include financial services, healthcare, and public sectors who need to protect sensitive data across AI workflows. The primary persona is the Agentic AI Developer (Agent Builder).

Primary Persona: Agentic AI Developer (Agent Builder)

Agent builders create systems that go beyond chat/RAG. They plan, call tools, take actions, and coordinate with other agents. As agentic AI expands unstructured data use and introduces new pipelines, data protection complexity rises significantly.

Attribute	Details
Role	Builds autonomous agent systems that plan, invoke tools, and coordinate across multi-agent architectures.
Pain Points	Sensitive data exposure in prompts/RAG/telemetry and across agentic workflows. Agents act with broader privileges than end users. Data crosses trust boundaries in multi-agent interactions.
Goals	Ship production-safe agents faster by embedding real-time PII protection directly into prompts, memory, and tool interactions without building custom privacy infrastructure.
Key Activities	Agent development, prompt/payload handling, retrieval pipelines, response rendering, telemetry/logging, tool calling via MCP, multi-agent orchestration via A2A.
Fit with AI Dev Edition	Strong Fit - mask/tokenize PII in prompts and data flows, semantic guardrails to prevent context poisoning, inline privacy for agent runtime.

Value Proposition: Protegrity AI Developer Edition is the fastest way for agent builders to make LLM-powered systems safe for real data. This is achieved by embedding masking, tokenization, and semantic guardrails directly into agent workflows.

Without Protegrity, an agent builder must build: PII detection models or regex, masking/tokenization logic, audit/compliance layer, and governance rules. AI Developer Edition provides out-of-box APIs, a developer sandbox, and pre-built PII entity detection; accelerating dev-to-production and reducing attack surface, compliance risk, and security approval cycles.

Supporting Personas

The following personas have been considered when developing AI Developer Edition.

Persona	Role Description	Pain Points	Fit with AI Developer Edition
Model Developer	Builds, trains, fine-tunes, and deploys AI models. Builds APIs and pipelines connecting LLMs to systems.	Training/data pipelines need tokenization/anonymization; sensitive data leakage in training data.	Strong Fit - Tokenization for training data, anonymization pipelines, synthetic data generation.
ML Engineer	Preps datasets for training/fine-tuning, manages feature stores and pipelines. Focuses on risk assessment, optimization, and data-driven decision-making.	PII minimization, consistent privacy across pipelines, governance, access controls, lineage.	Strong Fit - Tokenization for training data, consistent privacy across pipelines.
Prompt Engineer	Designs, tests, and optimizes prompts for generative AI models. Crafts precise instructions and evaluates outputs.	Context poisoning, sensitive information leakage to models and logs.	Medium Fit - Semantic guardrails for context poisoning prevention, data protection for leakage.
AI Application Developer	Integrates copilots with apps to automate processes. Embeds AI into enterprise services.	Connectors and admin-governed packaging for security needs.	Medium Fit - Protection APIs for copilot integrations.
Security Developer / Analyst	Part of security and risk teams focused on building security tools, defining policies, and implementing trust/risk/security management.	Information governance, runtime enforcement, audits, compliance.	Strong Fit - Discover and protect PII, policy simulation, audit capabilities.

1 - Release Highlights

What’s New in AI Developer Edition 1.2.0

General

The following updates are included in AI Developer Edition 1.2.0.

Added badges to the README for improved visibility and quick access to key resources.
Restructured directory for better organization of samples and source code. The files for each feature are now available in the feature directory.

Data Discovery

Data Discovery provides the functionality to explore and analyze datasets for sensitive information. For more information about Data Discovery, refer to the feature section.

Upgraded to Data Discovery version 2.0.
Added direct-API example scripts for text classification, tabular (CSV) classification, and redaction, both Python and bash variants.
Introduced an isolated data-discovery/docker-compose.yml for starting only the Data Discovery service.
Updated API endpoints to the v2 paths: v2/classify/text, v2/classify/csv, and v2/transform/label.

Semantic Guardrails

Semantic Guardrails provides functionality to enforce policies and guidelines within AI interactions. For more information about Semantic Guardrails, refer to the feature section.

Expanded domain model coverage to three verticals: customer service, financial, and healthcare.
Extended the pii processor to user messages, in addition to AI messages, enabling PII detection on both sides of a conversation.
Improved privacy in API responses: PII explanation output now returns character spans instead of the actual detected PII values.
Changed the OpenAPI documentation endpoint from /swagger to /doc.
Added a Jupyter notebook sample for seamless evaluation and execution.
Included richer examples in the sample files for easier understanding.

Synthetic Data

Synthetic Data provides functionality to generate artificial datasets that mimic real data while preserving privacy. For more information about Synthetic Data, refer to the feature section.

Expanded generative model support. In addition to GANs, Tabular Variational Autoencoders (TVAE) and diffusion-based models are now explicitly supported.
Added the typeHint parameter to the generate request payload, allowing explicit selection of the model type. For example, "model_type": "tabdiff" for diffusion-based generation.
When typeHint is not specified, the system automatically determines the most appropriate model during training.
Refactored the synthetic data generation code for improved performance and maintainability.
Updated the Jupyter notebook samples for quick evaluation and execution.

Anonymization

Anonymization provides functionality to remove or obscure personally identifiable information (PII) from datasets. For more information about Anonymization, refer to the feature section.

Added new feature for the AI Developer Edition: Anonymization.
Included sample code and documentation for using the Anonymization feature effectively.

Data Protection API Wrappers

Data Protection API Wrappers provide functionality to interact with the Data Protection APIs for tokenization, masking, and other privacy-preserving operations. For more information about Data Protection API Wrappers, refer to the feature section.

Protegrity Data Protection Jupyter notebook to quickly test tokenization.
Provided support and sample implementations for both Python and Java.
Ensured Java samples are fully compatible across Linux, macOS, and Windows.
Delivered Java source code for customization and compilation flexibility.