AI Developer Edition Architecture
A high-level architecture of AI Developer Edition is provided in the following image.

This release of AI Developer Edition includes sample applications. It showcases the capabilities of Data Discovery, Semantic Guardrail, Synthetic Data. The applications demonstrate protection and unprotection using simple Python modules or Java libraries. The Data Discovery component is used for identifying sensitive data. After identification, the Python module or Java library redacts, masks, or protects the sensitive information. Protection is done using the AI Developer Edition API Service.
Data Discovery
Data Discovery is a powerful, developer-friendly product designed specifically to address this challenge.
For more information, refer to the Data Discovery documentation.
Overview
Data Discovery Text Classification service advances data discovery and classification. It specializes in the detection of Personally Identifiable Information (PII), Protected Health Information (PHI), Payment Card Information (PCI) within plain text and free-text inputs. Unlike traditional structured data tools, it excels in dynamic, unstructured environments such as chatbot conversations, call transcripts, and Generative AI (GenAI) outputs.
Architecture
For more information about the general architecture and working of Data Discovery, refer to General architecture of Data Discovery.
Semantic Guardrails
Protegrity’s GenAI Security Semantic Guardrails solution is a security guardrail engine for AI systems. It evaluates risks in GenAI chatbots, workflows, and agents through advanced semantic analytics and intent classification to detect potentially malicious messages. PII detection can also be leveraged for comprehensive security coverage.
For more information, refer to the Semantic Guardrails documentation.
Overview
The current implementation is trained on synthetic customer-service AI chatbot datasets. The system performs best when analyzing conversations expected to match the training domain, that is, English-language based customer service interactions involving orders, tickets, and purchases.
For domain-specific and user-specific applications requiring high detection accuracy, fine-tuning is necessary to completely leverage the model’s ability. This helps the model to learn from expected conversation patterns and message structures in both the inputs and outputs of protected GenAI systems.
The system operates by analyzing conversations between participants. These participants are users and AI systems, such as LLMs, agents, or contextual information sources. Furthermore, the system leverages Protegrity’s Data Discovery, if present in the same network environment, to leverage PII detection in its internal decision algorithm.
The solution provides individual message risk scores and classifications, and cumulative conversation risk scores and classifications. This dual-scoring approach ensures that while individual messages may appear benign, potentially risky cumulative conversation patterns are identified. This significantly enhances detection of sophisticated attack vectors, including LLM jailbreaks and prompt injection attempts.
Architecture
For more information about the general architecture and working of Semantic Guardrails, refer to General architecture of Semantic Guardrails
Synthetic Data
Protegrity’s Synthetic Data solution is a Synthetic Data generator which generates artificial data that is realistic, statistically accurate, and privacy-safe. This data unlocks the full potential of AI and analytics. By creating entirely new data that mirrors the patterns of your original datasets but contains no sensitive information you can train and test AI models without risk. You can also scale these models without exposure or compliance violations.
For more information, refer to Synthetic Data documentation.
An overview of the communication is shown in the following figure.

The Synthetic Data system includes the following core components:
Key Pods and Services
Synthetic Data App Pod
- Orchestrates Synthetic Data generation.
MLFlow Pod
- Captures model training and evaluation.
- Hosted in containers for scalability.
MinIO Pod
- Stores models, model artifacts, and generated reports.
- Used by both MLFlow and Synthetic Data App pods.
SQL Database Server Pod
- Provides storage for MLFlow experiments metadata.
Data Generation Interfaces
Synthetic Data can be generated using:
- REST APIs
- Swagger UI
These interfaces allow developers and data scientists to interact with the system programmatically or visually.
Access and Networking
Users access the Protegrity Synthetic Data using HTTP over default port 8095 and other services using the following ports:
| Port | Communication Path |
|---|---|
| 5000 | MLFlow pod |
| 5432 | SQL Database Server |
| 8095 | Protegrity Synthetic Data Service |
| 9000 | MinIO |
Cloud Hosting Options
The entire Synthetic Data API can be hosted using any cloud-provided Kubernetes service, including:
- Amazon Elastic Kubernetes Service (EKS)
- Google Kubernetes Engine (GKE)
- Microsoft Azure Kubernetes Service (AKS)
- Red Hat OpenShift
- Other Kubernetes platforms
This flexibility allows organizations to scale Synthetic Data generation securely across environments.
AI Developer Edition API Service for Python and Java
Protegrity AI Developer Edition API Service features functionality derived from the original suite of Protegrity products in the form of API calls. The API endpoints are easy-to-use and require minimal configuration. Registration is required to send API requests to the service for protecting and unprotecting data. A set of predefined users and roles are provided. Based on the role used, the different scenarios can be tried and tested.
Sample Applications
Protegrity AI Developer Edition provides Python and Java application that showcase the features of Protegrity products.
sample-app-find
The sample-app-find is a Python or Java application that processes and identifies sensitive data.
It can be customized to do the following functions:
- Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
sample-app-find-and-redact
The sample-app-find-and-redact is a Python or Java application that processes the identified data and redacts or masks the information.
It can be customized to do the following functions:
- Specify the items that must be identified.
- Specify the operation to be performed on the data, which is redact or mask.
- Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
- Specify a file name and output location for the transformed data.
sample-guardrail-python
The sample-guardrail-python is a Python application that submits a request to Semantic Guardrails for analysis.
It can be customized to do the following functions:
- Specify the data that must be processed.
- Specify the operation that must be performed, that is,
semanticprocessor for messages andpiiprocessor for AI.
sample-app-find-and-protect
The sample-app-find-and-protect is a Python or Java application that processes the identified data and protects the information. Calls are made to the AI Developer Edition API Service for performing tokenization.
It can be customized to do the following functions:
- Specify the items that must be identified.
- Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
- Specify a file name and output location for the transformed data.
sample-app-find-and-unprotect
The sample-app-find-and-unprotect is a Python or Java application that unprotects the information protected by the sample-app-find-and-protect module. Calls are made to the AI Developer Edition API Service for performing detokenization.
It can be customized to do the following functions:
- Specify a file name and output location for the source data. Only data protected by the
sample-app-find-and-protectmodule can be unprotected. - Specify a file name and output location for the transformed data.
sample-app-protection
The sample-app-protection is a Python or Java application that protects and unprotects data. Calls are made to the AI Developer Edition API Service for performing tokenization. The Data Discovery and Semantic Guardrails containers are not required to be running for the sample-app-protection module to work.
It can be customized to do the following functions:
- Specify the items that must be protected, data element name, and user.
- Specify the operation that must be performed, protect and unprotect.
Feedback
Was this page helpful?