AI Developer Edition Architecture

An explanation of the architecture and components of AI Developer Edition.

A high-level architecture of AI Developer Edition is provided in the following image.

This release of AI Developer Edition includes sample applications. It showcases the capabilities of Data Discovery, Semantic Guardrail, Synthetic Data. The applications demonstrate protection and unprotection using simple Python modules or Java libraries. The Data Discovery component is used for identifying sensitive data. After identification, the Python module or Java library redacts, masks, or protects the sensitive information. Protection is done using the AI Developer Edition API Service.

Data Discovery

Data Discovery is a powerful, developer-friendly product designed specifically to address this challenge.

For more information, refer to the Data Discovery documentation.

Overview

Data Discovery Text Classification service advances data discovery and classification. It specializes in the detection of Personally Identifiable Information (PII), Protected Health Information (PHI), Payment Card Information (PCI) within plain text and free-text inputs. Unlike traditional structured data tools, it excels in dynamic, unstructured environments such as chatbot conversations, call transcripts, and Generative AI (GenAI) outputs.

Architecture

For more information about the general architecture and working of Data Discovery, refer to General architecture of Data Discovery.

Semantic Guardrails

Protegrity’s GenAI Security Semantic Guardrails solution is a security guardrail engine for AI systems. It evaluates risks in GenAI chatbots, workflows, and agents through advanced semantic analytics and intent classification to detect potentially malicious messages. PII detection can also be leveraged for comprehensive security coverage.

For more information, refer to the Semantic Guardrails documentation.

Overview

The current implementation is trained on synthetic customer-service AI chatbot datasets. The system performs best when analyzing conversations expected to match the training domain, that is, English-language based customer service interactions involving orders, tickets, and purchases.

For domain-specific and user-specific applications requiring high detection accuracy, fine-tuning is necessary to completely leverage the model’s ability. This helps the model to learn from expected conversation patterns and message structures in both the inputs and outputs of protected GenAI systems.

The system operates by analyzing conversations between participants. These participants are users and AI systems, such as LLMs, agents, or contextual information sources. Furthermore, the system leverages Protegrity’s Data Discovery, if present in the same network environment, to leverage PII detection in its internal decision algorithm.

The solution provides individual message risk scores and classifications, and cumulative conversation risk scores and classifications. This dual-scoring approach ensures that while individual messages may appear benign, potentially risky cumulative conversation patterns are identified. This significantly enhances detection of sophisticated attack vectors, including LLM jailbreaks and prompt injection attempts.

Architecture

For more information about the general architecture and working of Semantic Guardrails, refer to General architecture of Semantic Guardrails

Synthetic Data

Protegrity’s Synthetic Data solution is a Synthetic Data generator which generates artificial data that is realistic, statistically accurate, and privacy-safe. This data unlocks the full potential of AI and analytics. By creating entirely new data that mirrors the patterns of your original datasets but contains no sensitive information you can train and test AI models without risk. You can also scale these models without exposure or compliance violations.

For more information, refer to Synthetic Data documentation.

An overview of the communication is shown in the following figure.

Synthetic Data Components

The Synthetic Data system includes the following core components:

Key Pods and Services

Synthetic Data App Pod
- Orchestrates Synthetic Data generation.
MLFlow Pod
- Captures model training and evaluation.
- Hosted in containers for scalability.
MinIO Pod
- Stores models, model artifacts, and generated reports.
- Used by both MLFlow and Synthetic Data App pods.
SQL Database Server Pod
- Provides storage for MLFlow experiments metadata.

Data Generation Interfaces

Synthetic Data can be generated using:

REST APIs
Swagger UI

These interfaces allow developers and data scientists to interact with the system programmatically or visually.

Access and Networking

Users access the Protegrity Synthetic Data using HTTP over default port 8095 and other services using the following ports:

Port	Communication Path
5000	MLFlow pod
5432	SQL Database Server
8095	Protegrity Synthetic Data Service
9000	MinIO

Cloud Hosting Options

The entire Synthetic Data API can be hosted using any cloud-provided Kubernetes service, including:

Amazon Elastic Kubernetes Service (EKS)
Google Kubernetes Engine (GKE)
Microsoft Azure Kubernetes Service (AKS)
Red Hat OpenShift
Other Kubernetes platforms

This flexibility allows organizations to scale Synthetic Data generation securely across environments.

AI Developer Edition API Service for Python and Java

Protegrity AI Developer Edition API Service features functionality derived from the original suite of Protegrity products in the form of API calls. The API endpoints are easy-to-use and require minimal configuration. Registration is required to send API requests to the service for protecting and unprotecting data. A set of predefined users and roles are provided. Based on the role used, the different scenarios can be tried and tested.

Sample Applications

Protegrity AI Developer Edition provides Python and Java application that showcase the features of Protegrity products.