This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Configuring the sample application

The settings for running the sample application.

    The steps mentioned in this section are optional. The sample application can run to detect and redact the data with the default configurations. These configurations are only required when a change is required in the way that the files are processed. For example, a change in the name of the input or output file.

    Sample application configuration

    Specifying the source file

    The source file contains the data that must be processed. This file can have a paragraph of text or a table with values. Protegrity Developer Edition can process various files. However, for security reasons, certain characters are not processed and rejected. To enable or disable these security settings, refer to the section Input Sanitization. This version of the release only supports files containing plain text.

    To specify the source file:

    1. Navigate to the location where Protegrity Developer Edition is cloned.

    2. Open the sample-app-find-and-redact.py file from the samples directory.

    3. Locate the following statement.

      INPUT_FILE = BASE_DIR / "sample-data" / "sample-find-redact.txt"
      
    4. Update the path and name for the source file.

    5. Save and close the file.

    6. Run the Python file.

    Specifying the output file

    The output file location specifies where the processed output file must be stored.

    To specify the source file:

    1. Navigate to the location where Protegrity Developer Edition is cloned.

    2. Open the sample-app-find-and-redact.py file from the samples directory.

    3. Locate the following statement.

      OUTPUT_FILE = BASE_DIR / "sample-data" / "output.txt"
      
    4. Update the path and name for the output file.

    5. Save and close the file.

    6. Run the Python file.

    Specifying the configuration settings

    Use the config.json configuration file to specify the data that must be redacted or masked. The character that must be used for masking can also be specified.

    Before you begin:

    Identify the sensitive fields that are present in the source file.

    1. Open a command prompt.

    2. Navigate to the directory where the sample application is extracted.

    3. Run the following command.

      python sample/sample-app-find.py
      
    4. View the list of sensitive items. For a complete list of items that can be identified, refer to the List of items.

    Updating the configuration file.

    1. Navigate to the location where Protegrity Developer Edition is cloned.

    2. Open the config.json file.

    3. Specify the masking character to use in the following code.

      "masking_char": "#"
      
    4. Specify the text to use for the redacted data in the named_entity_map parameter. The following code shows the value used for the sample source file.

      "named_entity_map": {
          "PERSON": "PERSON",
          "PHONE_NUMBER": "PHONE",
          "CREDIT_CARD": "CCN",
          "DATE_TIME": "DATE",
          "EMAIL_ADDRESS": "EMAIL"
      }
      
    5. Specify the operation to perform on the source file. The available options are mask and redact.

          "method": "mask"
      
    6. Save and close the file.

    7. Run the Python file.

    Specifying the classification score threshold settings

    The classification score threshold sets the minimum confidence level needed for the system to treat detected data as valid. It helps filter out uncertain matches so only high-confidence results are flagged. Adjust this threshold during setup. It is a value, such as, 0.6 for 60%. Lowering it makes the system more sensitive, while raising it reduces false positives.

    To set the value:

    1. Navigate to the location where Protegrity Developer Edition is cloned.

    2. Open the sample-app-find-and-redact.py file from the samples directory.

    3. Locate the following statement.

      "classification_score_threshold", 0.6
      
    4. Set the required value.

    5. Save and close the file.

    6. Run the Python file.

    Specifying the logging parameters

    The log messages are sent to the terminal. To capture logging data, transfer and save the output of the commands to a log file.

    To set the logging level:

    1. Navigate to the location where Protegrity Developer Edition is cloned.

    2. Open the config.json file.

    3. Locate the following statement.

      "enable_logging": True,
      "log_level": "INFO",
      
    4. Ensure that logging is set to True and set the required log level that must be displayed.

    5. Save and close the file.

    6. Run the Python file.

    Python module configuration

    The following parameters are configurable for Developer Edition.

    ParameterDescriptionValuesExample
    endpoint_urlThe Data Discovery endpoint for classifying sensitive data.Specify a URL.http://localhost:8580/pty/data-discovery/v1.0/classify
    named_entity_mapA dictionary or map of entities and their corresponding replacement names.List of itemsnamed_entity_map": { “PERSON”: “PERSON”,“PHONE_NUMBER”: “PHONE”}
    masking_charThe character to be used for masking.Specify a special character.#
    classification_score_thresholdThe minimum confidence level needed for the system to treat detected data as valid.Specify a number between 0 and 1.00.6
    methodThe method for processing sensitive data.redact or maskmask
    enable_loggingSpecify whether to enable logging.True or FalseTrue