1 - Input Sanitization

Rejecting unsanitized data.

The Classification service in Data Discovery offers a security feature that rejects unsanitized data. Data that is malformed, non-normalized, containing homoglyphs, hieroglyphs, mixed Unicode variants, or control characters are considered as unsanitized data. These are rejected for classification.

The following are few examples of data that will be rejected:

  • 𝓉𝑒𝓍𝓉
  • Pep

Before invoking the Classification endpoint, ensure that the input text is normalized. Replace invalid characters by their corresponding normalized plaintext characters. If the input text contains any invalid character, a status code of 422 and a message Untrusted input is returned.

For security purposes, the application rejects unsanitized data by default. It is recommended that this feature remains enabled. However, to override this feature, perform the following steps.

  1. Navigate to the docker_compose directory.

  2. Edit the docker-compose.yaml file.

  3. Under the environment section of classification_service, append the security parameter as follows.

- SECURITY_SETTINGS={"ENABLE_ALL_SECURITY_CONTROLS":false}
  1. Save the changes.

  2. Run the docker compose down command to undeploy the application.

  3. Run the docker compose up command to redeploy the application.

2 - Working with the Data Discovery containers

Using the Data Discovery containers.

Use Data Discovery by setting up and deploying the containers.

2.1 - Understanding the Docker Compose File

Details of the configurable parameters in the docker-compose.yml file.

The following variables can be configured in the docker-compose.yml file.

VariableDescriptionMandatory
networks:nameSpecify the name of the Docker network.No
services:enviromentSpecify the location for the logs in the logging_config parameter.No
classification_service:portsSpecify the listening port for the classification service. By default, the port is set to 8580.No

2.2 - Deploying the Application

Deploying the Data Discovery container.

Ensure that the prerequisites are completed before deploying the application.

Run the following steps to deploy the Data Discovery application on Docker.

  1. Open a command prompt.

  2. Navigate to the Developer Edition package directory.

  3. Run the command to start the containers. For example, the following command starts the Classification service container.

docker compose up -d

3 - Supported Sensitive Entity Types

PII entities supported by Protegrity Developer Edition.
Entity NameDescription
ABA_ROUTING_NUMBERRouting number used to identify financial institutions in the United States.
ACCOUNT_NAMEName associated with a financial account.
ACCOUNT_NUMBERBank account number used to identify financial accounts.
AGEAge information used to identify individuals.
AMOUNTSpecific amount of money, which can be linked to financial transactions.
AU_ABNAustralian Business Number used to identify businesses in Australia.
AU_ACNAustralian Company Number used to identify businesses in Australia.
AU_MEDICAREMedicare number used to identify individuals for healthcare services in Australia.
AU_TFNTax File Number used to identify taxpayers in Australia.
BICBank Identifier Code used to identify financial institutions.
BITCOIN_ADDRESSBitcoin wallet address used for digital transactions.
BUILDINGBuilding information used to identify specific locations.
CITYCity information used to identify geographic locations.
COMPANY_NAMEName of a company used to identify businesses.
COUNTRYCountry information used to identify geographic locations.
COUNTYCounty information used to identify geographic locations.
CREDIT_CARDCredit card number used for financial transactions.
CREDIT_CARD_CVVCard Verification Value used to secure credit card transactions.
CRYPTOCryptocurrency wallet address used for digital transactions.
CURRENCYCurrency information used in financial transactions.
CURRENCY_CODECode representing currency used in financial transactions.
CURRENCY_NAMEName of currency used in financial transactions.
CURRENCY_SYMBOLSymbol representing currency, sometimes linked to financial transactions.
DATESpecific date that can be linked to personal activities.
DATE_OF_BIRTHDate of birth used to identify individuals.
DATE_TIMESpecific date and time that can be linked to personal activities.
DRIVER_LICENSEDriver’s license number used to identify individuals.
EMAIL_ADDRESSEmail address used for communication and identification.
ES_NIEForeigner Identification Number used to identify non-residents in Spain.
ES_NIFTax Identification Number used to identify taxpayers in Spain.
ETHEREUM_ADDRESSEthereum wallet address used for digital transactions.
FI_PERSONAL_IDENTITY_CODEPersonal identity code used to identify individuals in Finland.
GENDERGender information used to identify individuals.
GEO_CCORDINATEGeographic coordinates used to identify specific locations.
IBAN_CODEInternational Bank Account Number used to identify bank accounts globally.
ID_CARDIdentity card number used to identify individuals.
IN_AADHAARUnique identification number used to identify residents in India.
IN_PANPermanent Account Number used to identify taxpayers in India.
IN_PASSPORTPassport number used to identify individuals in India.
IN_VEHICLE_REGISTRATIONVehicle registration number used to identify vehicles in India.
IN_VOTERVoter ID number used to identify registered voters in India.
IP_ADDRESSInternet Protocol address used to identify devices on a network.
IPV4IPv4 address used to identify devices on a network.
IPV6IPv6 address used to identify devices on a network.
IT_DRIVER_LICENSEDriver’s license number used to identify individuals in Italy.
IT_FISCAL_CODEFiscal code used to identify taxpayers in Italy.
IT_IDENTITY_CARDIdentity card number used to identify individuals in Italy.
IT_PASSPORTPassport number used to identify individuals in Italy.
LITECOIN_ADDRESSLitecoin wallet address used for digital transactions.
LOCATIONSpecific location or address that can be linked to an individual.
MACMedia Access Control address used to identify devices on a network.
MEDICAL_LICENSELicense number used to identify medical professionals.
NRPA person’s nationality, religious or political group.
ORGANIZATIONName or identifier used to identify an organization.
PASSPORTPassport number used to identify individuals.
PASSWORDPassword used to secure access to personal accounts.
PERSONName or identifier used to identify an individual.
PHONE_NUMBERNumber used to contact or identify an individual.
PINPersonal Identification Number used to secure access to accounts.
PL_PESELPersonal Identification Number used to identify individuals in Poland.
SECONDARY_ADDRESSAdditional address information used to identify locations.
SG_NRIC_FINNational Registration Identity Card number used to identify residents in Singapore.
SG_UENUnique Entity Number used to identify businesses in Singapore.
SOCIAL_SECURITY_NUMBERSocial Security Number used to identify individuals.
STATEState information used to identify geographic locations.
STREETStreet address used to identify specific locations.
TIMESpecific time that can be linked to personal activities.
TITLETitle or honorific used to identify individuals.
UK_NHSNational Health Service number used to identify individuals for healthcare services in the United Kingdom.
URLWeb address that can sometimes contain personal information.
US_BANK_NUMBERBank account number used to identify financial accounts in the United States.
US_DRIVER_LICENSEDriver’s license number used to identify individuals in the United States.
US_ITINIndividual Taxpayer Identification Number used to identify taxpayers in the United States.
US_PASSPORTPassport number used to identify individuals in the United States.
US_SSNSocial Security Number used to identify individuals in the United States.
USERNAMEUsername used to identify individuals in online systems.
ZIP_CODEPostal code used to identify specific geographic areas.

4 - Uninstalling Developer Edition

Steps for removing the product.
  1. Open a command prompt.

  2. Navigate to the cloned repository location.

  3. Run the following command to remove the containers.

    docker compose down --rmi all
    
  4. Run the following command to remove the Python module.

    pip uninstall protegrity-developer-python==0.9.0