This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Running the sample application

A sample application to use Developer Edition.

In the Developer Edition, a user uploads a file using the sample application, which is processed by the Data Discovery container. The containers detect sensitive data. A Python module then redacts or masks the data. The sanitized file is saved to a configured location. For more information about the sample application, refer to Sample application.

Use the steps provided here to run the application end-to-end. If required, run the APIs and functions provided for performing specific tasks. For more information about the identification APIs, refer to Data Discovery API.

Running the sample application

The sample application is configured out-of-the-box to identify and redact data from the sample file.

  1. Open a command prompt.

  2. Navigate to the directory where Developer Edition is cloned.

  3. Run the sample application using the following command.

    python samples/sample-app-find-and-redact.py
    
  4. View the output of the files processed on the screen. The output displays a list of sensitive items in the source file. It also displays the location and name of the output file with the redacted output.

“Sample application output”

  1. View the processed output file in the output directory.

Integrating the Python module in an application

Alternatively, to integrate and use the Protegrity Python module in a Python application, customize and use the sample code provided here.

  1. Open a command prompt.

  2. Create a Python file.

  3. Import the installed Python module.

    import protegrity_developer_python
    
  4. Specify the configuration. For more information about the settings, refer to the Python module configuration.

    protegrity_developer_python.configure(
    endpoint_url="http://localhost:8580/pty/data-discovery/v1.0/classify",
    named_entity_map={"PERSON": "NAME", "SOCIAL_SECURITY_NUMBER": "SSN"},
    masking_char="#",
    classification_score_threshold=0.6,
    method="redact",
    enable_logging=True,
    log_level="info"
    )
    
  5. Specify the input text.

    input_text = "John Doe's SSN is 123-45-6789."
    
  6. Call the module to process the data.

    output_text = protegrity_developer_python.find_and_redact(input_text)
    
  7. View the redacted output.

    print(output_text)
    
  8. Save, close, and run the file.

1 - Data Discovery API

Classify API.

Data Discovery Classification Service

This API identifies, classifies, and locates sensitive data.

Endpoint

https://{Host Address}/pty/data-discovery/v1.0/classify

Path

/pty/data-discovery/v1.0/classify

Method

POST

Parameters

Define the value in the score_threshold parameter to exclude results with a low score. This parameter is optional and accepts the following values:

Type: float
Values: minimum 0, maximum 1.0
Default: 0.00

For example, score_threshold = 0.75

Example Data

You can reach Dave Elliot by phone 203-555-1286.

The data should be in UTF-8 format. Also, the limit on the length of the characters is 10,000.

Sample Request

https://{Host address}/pty/data-discovery/v1.0/classify

Response Codes

Successful Response.
{
        "providers": [
          {
            "name": "Presidio Classification Provider",
            "version": "1.0.0",
            "status": 200,
            "elapsed_time": 1.014178991317749,
            "exception": null,
            "config_provider": {
              "name": "Presidio",
              "address": "http://presidio_provider_service",
              "supported_content_types": []
            }
          },
          {
            "name": "Roberta Classification Provider",
            "version": "1.0.0",
            "status": 200,
            "elapsed_time": 19.091534852981567,
            "exception": null,
            "config_provider": {
              "name": "Roberta",
              "address": "http://roberta_provider_service",
              "supported_content_types": []
            }
          }
        ],
        "classifications": {
          "PERSON": [
            {
              "score": 0.9236000061035157,
              "location": {
                "start_index": 14,
                "end_index": 25
              },
              "classifiers": [
                {
                  "provider_index": 0,
                  "name": "SpacyRecognizer",
                  "score": 0.85,
                  "details": {}
                },
                {
                  "provider_index": 1,
                  "name": "roberta",
                  "score": 0.9972000122070312,
                  "details": {}
                }
              ]
            }
          ],
          "PHONE_NUMBER": [
            {
              "score": 0.8746500015258789,
              "location": {
                "start_index": 35,
                "end_index": 47
              },
              "classifiers": [
                {
                  "provider_index": 0,
                  "name": "PhoneRecognizer",
                  "score": 0.75,
                  "details": {}
                },
                {
                  "provider_index": 1,
                  "name": "roberta",
                  "score": 0.9993000030517578,
                  "details": {}
                }
              ]
            }
          ]
        }
      }
Request must have a body, but no request body was provided.
Payload too large.
Unsupported media type.
Unexpected internal server error. Check server logs.
Internal server error. Check server logs.

Sample Request

curl -X POST "https://<SERVER_IP>/pty/data-discovery/v1.0/classify?score_threshold=0.85" \
          -H "Content-Type: text/plain" \
          --data "You can reach Dave Elliot by phone 203-555-1286"
import requests
    
    url = "https://<SERVER_IP>/pty/data-discovery/v1.0/classify"
    params = {"score_threshold": 0.85}
    headers = {"Content-Type": "text/plain"}
    data = "You can reach Dave Elliot by phone 203-555-1286"
    
    response = requests.post(url, params=params, headers=headers, data=data, verify=False)
    
    print("Status code:", response.status_code)
    print("Response JSON:", response.json())
URL: POST `https://<SERVER_IP>/pty/data-discovery/v1.0/classify`
   Query Parameters:
   -score_threshold (optional), float between 0.0 and 1.0, default: 0.
   Headers:
   -Content-Type: text/plain
   Body:
   -You can reach Dave Elliot by phone 203-555-1286