AI Developer Edition Architecture
A high-level architecture of AI Developer Edition is provided in the following image.

This release of AI Developer Edition consists of sample applications that utilizes and showcases the capabilities of Data Discovery, Semantic Guardrail, protection, and unprotection using simple Python modules. The Data Discovery component is used for identifying sensitive data. After identification, the Python module redacts, masks, or protects the sensitive information. Protection is done using the AI Developer Edition API Service.
Data Discovery
Data Discovery is a powerful, developer-friendly product designed specifically to address this challenge.
For more information, refer to the Data Discovery documentation.
Overview
Data Discovery Text Classification service advances data discovery and classification, specializing in the detection of Personally Identifiable Information (PII), Protected Health Information (PHI), Payment Card Information (PCI) within plain text and free-text inputs. Unlike traditional structured data tools, it excels in dynamic, unstructured environments such as chatbot conversations, call transcripts, and Generative AI (GenAI) outputs.
Architecture
Data Discovery consists of three containers that are hosted on Docker, the Classification container, the Presidio provider container, and similarly, the RoBERTa provider container. The general architecture is illustrated in the following figure.

| Component | Description |
|---|---|
| 1 | The user enters the data to be classified for sensitive data as text body and sends the request to the Classification service. |
| 2 | This Classification service then distributes the request to the Presidio and RoBERTa service providers to process the data. |
| 3 | The Presidio and RoBERTa providers process the data based on their logic and classify them in the form of a response to the Classification service. |
| 4 | The Classification service then aggregates the responses from the service providers and sends it to the user. |
Semantic Guardrail
Protegrity’s GenAI Security - Semantic Guardrail solution is a security guardrail engine for AI systems. It evaluates risks in GenAI chatbots, workflows, and agents through advanced semantic analytics and intent classification to detect potentially malicious messages. PII detection can also be leveraged for comprehensive security coverage.
For more information, refer to the Semantic Guardrail documentation.
Overview
The current implementation is trained on synthetic customer-service AI chatbot datasets. The system performs best when analyzing conversations expected to match the training domain, that is, English-language based customer service interactions involving orders, tickets, and purchases.
For domain-specific and user-specific applications requiring high detection accuracy, fine-tuning is necessary to completely leverage the model’s ability. This helps the model to learn from expected conversation patterns and message structures in both the inputs and outputs of protected GenAI systems.
The system operates by analyzing conversations between participants. These participants are users and AI systems, such as LLMs, agents, or contextual information sources. Furthermore, the system leverages Protegrity’s Data Discovery, if present in the same network environment, to leverage PII detection in its internal decision algorithm.
The solution provides individual message risk scores and classifications, and cumulative conversation risk scores and classifications. This dual-scoring approach ensures that while individual messages may appear benign, potentially risky cumulative conversation patterns are identified. This significantly enhances detection of sophisticated attack vectors, including LLM jailbreaks and prompt injection attempts.
Architecture
The diagram shows how client applications integrate with Semantic Guardrail, and how Data Discovery PII can be integrated as a PII detector provider.

| Component | Description |
|---|---|
| External AI System | AI system, such as AI chatbot or Agent, that responds to a user, using LLM and data, which is integrated with the Semantic Guardrail solution. |
| External LLM | LLM employed as reasoning engine by the external AI system. |
| External Data Sources | Data sources used by an external AI system. |
| Semantic Guardrail | The core application operates as a containerized Docker service. It processes conversation data through HTTP requests and performs comprehensive security risk analysis, applying guardrails including Semantic Guardrail. |
| Data Discovery | For PII detection capabilities, Semantic Guardrail can leverage Protegrity’s Data Discovery solution. This solution operates as specialized Docker containers within the same environment. |
AI Developer Edition API Service
Protegrity AI Developer Edition API Service features functionality derived from the original suite of Protegrity products in a form of API calls. The API endpoints are easy-to-use and require minimal configuration. Registration is required to send API requests to the service for protecting and unprotecting data. A set of predefined users and roles are provided. Based on the role used, the different scenarios can be tried and tested.
Sample Applications
Protegrity AI Developer Edition provides Python modules that showcase the features of Protegrity products.
sample-app-find module
The sample-app-find module is a Python library that process and identifies sensitive data.
The module can be customized to do the following functions:
- Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
sample-app-find-and-redact module
The sample-app-find-and-redact module is a Python library that process the identified data and redacts or masks the information.
The module can be customized to do the following functions:
- Specify the items that must be identified.
- Specify the operation to be performed on the data, which is redact or mask.
- Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
- Specify a file name and output location for the transformed data.
sample-guardrail-python module
The sample-guardrail-python module is a Python library that submits a request to Semantic Guardrail for analysis.
The module can be customized to do the following functions:
- Specify the data that must be processed.
- Specify the operation that must be performed, that is,
semanticprocessor for messages andpiiprocessor for AI.
sample-app-find-and-protect module
The sample-app-find-and-protect module is a Python library that process the identified data and protects the information. Calls are made to the AI Developer Edition API Service for performing tokenization.
The module can be customized to do the following functions:
- Specify the items that must be identified.
- Specify a file name and output location for the source data. Only raw file formats are supported for Data Discovery. Multipart formats are not supported; only binary files are accepted.
- Specify a file name and output location for the transformed data.
sample-app-find-and-unprotect module
The sample-app-find-and-unprotect module is a Python library that unprotects the information protected by the sample-app-find-and-protect module. Calls are made to the AI Developer Edition API Service for performing detokenization.
The module can be customized to do the following functions:
- Specify a file name and output location for the source data. Only data protected by the
sample-app-find-and-protectmodule can be unprotected. - Specify a file name and output location for the transformed data.
sample-app-protection module
The sample-app-protection module is a Python library that protects and unprotects data. Calls are made to the AI Developer Edition API Service for performing tokenization. The Data Discovery and Semantic Guardrail containers are not required to be running for the sample-app-protection module.
The module can be customized to do the following functions:
- Specify the items that must be protected, data element name, and user.
- Specify the operation that must be performed, protect and unprotect.
Feedback
Was this page helpful?