Updating configurations for the sample application
The steps mentioned in this section are optional. The sample application can run to detect and redact the data with the default configuration.
Sample application customization for Python and Java
Note: From the feature directory, use
.pyfile for Python. For Java Linux or macOS, use.shfile and for Java Windows, use.batfile. For Jupyter Notebook, use the Python code snippets or the.ipynbfile.
Specifying the source file
The source file contains the data that must be processed. This file can have a paragraph of text or a table with values. Protegrity AI Developer Edition can process various files. However, for security reasons, certain characters are not processed and rejected. To enable or disable these security settings, refer to the section Input Sanitization. This version of the release only supports files containing plain text.
To specify the source file:
- Navigate to the location where Protegrity AI Developer Edition is cloned.
- Open the
sample-app-find-and-redact.pyfile from thesolutions/find-and-redact/directory. - Locate the following statement.
input_file = base_dir / "shared" / "data" / "input.txt" - Update the path and name for the source file.
- Save and close the file.
- Run the Python file.
- Navigate to the location where Protegrity AI Developer Edition is cloned.
- Open the
SampleAppFindAndRedact.javafile from the/data-protection/samples/java/src/main/java/com/protegrity/devedition/samples/directory. - Locate the following statement.
Path inputFile = sampleDataDir.resolve("sample-data").resolve("input.txt"); - Update the path and name for the source file.
- Save and close the file.
- Compile the Java code by running the following command from the
/data-protection/samples/java/directory../mvnw clean package - Run the shell script for Linux.
./sample-app-find-and-redact.sh
Specifying the output file
The output file location specifies where the processed output file must be stored.
To specify the output file:
- Navigate to the location where Protegrity AI Developer Edition is cloned.
- Open the
sample-app-find-and-redact.pyfile from thesolutions/find-and-redact/directory. - Locate the following statement.
output_file = base_dir / "shared" / "data" / "output-redact.txt" - Update the path and name for the output file.
- Save and close the file.
- Run the Python file.
- Navigate to the location where Protegrity AI Developer Edition is cloned.
- Open the
SampleAppFindAndRedact.javafile from the/data-protection/samples/java/src/main/java/com/protegrity/devedition/samples/directory. - Locate the following statement.
Path outputFile = sampleDataDir.resolve("sample-data").resolve("output-redact.txt"); - Update the path and name for the output file.
- Save and close the file.
- Compile the Java code by running the following command from the
/data-protection/samples/java/directory../mvnw clean install - Run the shell script for Linux.
./sample-app-find-and-redact.sh
Specifying the configuration settings
Use the config.json configuration file to specify the data that must be redacted or masked. The character that must be used for masking can also be specified.
Before you begin:
Identify the sensitive fields that are present in the source file.
- Open a command prompt.
- Navigate to the
/solutions/find-and-redact/directory where the sample application is extracted. - Run the following command.
python solutions/find-and-redact/sample-app-find.py - View the supported entities. For a complete list of supported entities, refer to Supported Classification Entities.
- Open a command prompt.
- Navigate to the
/data-protection/samples/java/src/main/java/com/protegrity/devedition/samples/directory where the sample application is extracted. - Run the following command.
./sample-app-find.sh - View the supported entities. For a complete list of supported entities, refer to Supported Classification Entities.
Updating the configuration file
The configuration file controls how the sample application processes sensitive data. Update the config.json file to customize the masking character, define entity mappings, and select the data protection method.
Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the
config.jsonfile from theshareddirectory.Specify the masking character to use in the following code.
"masking_char": "#"Specify the text to use for the redacted data in the
named_entity_mapparameter. The following code shows the value used for the sample source file."named_entity_map": { "PERSON": "PERSON", "LOCATION": "LOCATION", "SOCIAL_SECURITY_ID": "SSN", "PHONE_NUMBER": "PHONE", "AGE": "AGE", "USERNAME": "USERNAME" }Specify the operation to perform on the source file. The available options are
maskandredact."method": "mask"Save and close the file.
Run the
sample-app-find-and-redact.pyfile from the/solutions/find-and-redact/directory.
Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the
config.jsonfile from theshareddirectory.Specify the masking character to use in the following code.
"masking_char": "#"Specify the text to use for the redacted data in the
named_entity_mapparameter. The following code shows the value used for the sample source file."named_entity_map": { "PERSON": "PERSON", "LOCATION": "LOCATION", "SOCIAL_SECURITY_ID": "SSN", "PHONE_NUMBER": "PHONE", "AGE": "AGE", "USERNAME": "USERNAME" }Specify the operation to perform on the source file. The available options are
maskandredact."method": "mask"Save and close the file.
Run the shell script for Linux from the
/data-protection/samples/java/directory../sample-app-find-and-redact.sh
Specifying the classification score threshold settings
The classification score threshold sets the minimum confidence level needed for the system to treat detected data as valid. It helps filter out uncertain matches so only high-confidence results are flagged. Adjust this threshold during setup. It is a value, such as 0.6 for 60%. Lowering it makes the system more sensitive, while raising it reduces false positives.
To set the value:
Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the
config.jsonfile from theshareddirectory.Add the following command.
"classification_score_threshold": 0.6Set the threshold to the required value.
Note: Specify a number between 0 and 1.0.
Save and close the file.
Run the
sample-app-find-and-redact.pyfile from the/solutions/find-and-redact/directory.
Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the
config.jsonfile from theshareddirectory.Add the following command.
"classification_score_threshold": 0.6Set the threshold to the required value.
Note: Specify a number between 0 and 1.0.
Save and close the file.
Run the shell script for Linux from the
/data-protection/samples/java/directory../sample-app-find-and-redact.sh
Specifying the logging parameters
The log messages are sent to the terminal. To capture logging data, transfer and save the output of the commands to a log file.
To set the logging level:
Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the
config.jsonfile from theshareddirectory.Locate or add the following statement.
"enable_logging": true, "log_level": "info",Ensure that logging is set to true and set the required log level that must be displayed.
Save and close the file.
Run the
sample-app-find-and-redact.pyfile from the/solutions/find-and-redact/directory.
Navigate to the location where Protegrity AI Developer Edition is cloned.
Open the
config.jsonfile from theshareddirectory.Locate or add the following statement.
"enable_logging": true, "log_level": "info",Ensure that logging is set to true and set the required log level that must be displayed.
Save and close the file.
Run the shell script for Linux from the
/data-protection/samples/java/directory../sample-app-find-and-redact.sh
Python module and Java library configuration
The following parameters are configurable for AI Developer Edition.
| Parameter | Description | Values | Example |
|---|---|---|---|
| endpoint_url | The Data Discovery and Semantic Guardrails endpoints. | Specify a URL. | - Classification API: http://localhost:8580/pty/data-discovery/v2/classify - Semantic Guardrails API: http://localhost:8581/pty/semantic-guardrail/v1.1/conversations/messages/scan |
| named_entity_map | A dictionary or map of entities and their corresponding replacement names. | Supported Classification Entities | named_entity_map": { “PERSON”: “PERSON”,“PHONE_NUMBER”: “PHONE”} |
| masking_char | The character to be used for masking. | Specify a special character. | # |
| classification_score_threshold | The minimum confidence level needed for the system to treat detected data as valid. | Specify a number between 0 and 1. | 0.6 |
| method | The method for processing sensitive data. | redact or mask | mask |
| enable_logging | Specify whether to enable logging. | true or false | true |
Feedback
Was this page helpful?