This is the multi-page printable view of this section. Click here to print.
Appendix
1 - Input Sanitization
The Classification service in Data Discovery offers a security feature that rejects unsanitized data. Data that is malformed, non-normalized, containing homoglyphs, hieroglyphs, mixed Unicode variants, or control characters are considered as unsanitized data. These are rejected for classification.
The following are few examples of data that will be rejected:
- Ⅷ
- 𝓉𝑒𝓍𝓉
- Pep
Before invoking the Classification endpoint, ensure that the input text is normalized. Replace invalid characters by their corresponding normalized plaintext characters. If the input text contains any invalid character, a status code of 422 and a message Untrusted input is returned.
For security purposes, the application rejects unsanitized data by default. It is recommended that this feature remains enabled. However, to override this feature, perform the following steps.
Navigate to the
docker_composedirectory.Edit the
docker-compose.yamlfile.Under the
environmentsection ofclassification_service, append the security parameter as follows.
- SECURITY_SETTINGS={"ENABLE_ALL_SECURITY_CONTROLS":false}
Save the changes.
Run the
docker compose downcommand to undeploy the application.Run the
docker compose upcommand to redeploy the application.
2 - Working with the Data Discovery containers
Use Data Discovery by setting up and deploying the containers.
2.1 - Understanding the Docker Compose File
The following variables can be configured in the docker-compose.yml file.
| Variable | Description | Mandatory |
|---|---|---|
| networks:name | Specify the name of the Docker network. | No |
| services:enviroment | Specify the location for the logs in the logging_config parameter. | No |
| classification_service:ports | Specify the listening port for the classification service. By default, the port is set to 8580. | No |
2.2 - Deploying the Application
Ensure that the prerequisites are completed before deploying the application.
Run the following steps to deploy the Data Discovery application on Docker.
Open a command prompt.
Navigate to the Developer Edition package directory.
Run the command to start the containers. For example, the following command starts the Classification service container.
docker compose up -d
3 - Supported Sensitive Entity Types
| Entity Name | Description |
|---|---|
| ABA_ROUTING_NUMBER | Routing number used to identify financial institutions in the United States. |
| ACCOUNT_NAME | Name associated with a financial account. |
| ACCOUNT_NUMBER | Bank account number used to identify financial accounts. |
| AGE | Age information used to identify individuals. |
| AMOUNT | Specific amount of money, which can be linked to financial transactions. |
| AU_ABN | Australian Business Number used to identify businesses in Australia. |
| AU_ACN | Australian Company Number used to identify businesses in Australia. |
| AU_MEDICARE | Medicare number used to identify individuals for healthcare services in Australia. |
| AU_TFN | Tax File Number used to identify taxpayers in Australia. |
| BIC | Bank Identifier Code used to identify financial institutions. |
| BITCOIN_ADDRESS | Bitcoin wallet address used for digital transactions. |
| BUILDING | Building information used to identify specific locations. |
| CITY | City information used to identify geographic locations. |
| COMPANY_NAME | Name of a company used to identify businesses. |
| COUNTRY | Country information used to identify geographic locations. |
| COUNTY | County information used to identify geographic locations. |
| CREDIT_CARD | Credit card number used for financial transactions. |
| CREDIT_CARD_CVV | Card Verification Value used to secure credit card transactions. |
| CRYPTO | Cryptocurrency wallet address used for digital transactions. |
| CURRENCY | Currency information used in financial transactions. |
| CURRENCY_CODE | Code representing currency used in financial transactions. |
| CURRENCY_NAME | Name of currency used in financial transactions. |
| CURRENCY_SYMBOL | Symbol representing currency, sometimes linked to financial transactions. |
| DATE | Specific date that can be linked to personal activities. |
| DATE_OF_BIRTH | Date of birth used to identify individuals. |
| DATE_TIME | Specific date and time that can be linked to personal activities. |
| DRIVER_LICENSE | Driver’s license number used to identify individuals. |
| EMAIL_ADDRESS | Email address used for communication and identification. |
| ES_NIE | Foreigner Identification Number used to identify non-residents in Spain. |
| ES_NIF | Tax Identification Number used to identify taxpayers in Spain. |
| ETHEREUM_ADDRESS | Ethereum wallet address used for digital transactions. |
| FI_PERSONAL_IDENTITY_CODE | Personal identity code used to identify individuals in Finland. |
| GENDER | Gender information used to identify individuals. |
| GEO_CCORDINATE | Geographic coordinates used to identify specific locations. |
| IBAN_CODE | International Bank Account Number used to identify bank accounts globally. |
| ID_CARD | Identity card number used to identify individuals. |
| IN_AADHAAR | Unique identification number used to identify residents in India. |
| IN_PAN | Permanent Account Number used to identify taxpayers in India. |
| IN_PASSPORT | Passport number used to identify individuals in India. |
| IN_VEHICLE_REGISTRATION | Vehicle registration number used to identify vehicles in India. |
| IN_VOTER | Voter ID number used to identify registered voters in India. |
| IP_ADDRESS | Internet Protocol address used to identify devices on a network. |
| IPV4 | IPv4 address used to identify devices on a network. |
| IPV6 | IPv6 address used to identify devices on a network. |
| IT_DRIVER_LICENSE | Driver’s license number used to identify individuals in Italy. |
| IT_FISCAL_CODE | Fiscal code used to identify taxpayers in Italy. |
| IT_IDENTITY_CARD | Identity card number used to identify individuals in Italy. |
| IT_PASSPORT | Passport number used to identify individuals in Italy. |
| LITECOIN_ADDRESS | Litecoin wallet address used for digital transactions. |
| LOCATION | Specific location or address that can be linked to an individual. |
| MAC | Media Access Control address used to identify devices on a network. |
| MEDICAL_LICENSE | License number used to identify medical professionals. |
| NRP | A person’s nationality, religious or political group. |
| ORGANIZATION | Name or identifier used to identify an organization. |
| PASSPORT | Passport number used to identify individuals. |
| PASSWORD | Password used to secure access to personal accounts. |
| PERSON | Name or identifier used to identify an individual. |
| PHONE_NUMBER | Number used to contact or identify an individual. |
| PIN | Personal Identification Number used to secure access to accounts. |
| PL_PESEL | Personal Identification Number used to identify individuals in Poland. |
| SECONDARY_ADDRESS | Additional address information used to identify locations. |
| SG_NRIC_FIN | National Registration Identity Card number used to identify residents in Singapore. |
| SG_UEN | Unique Entity Number used to identify businesses in Singapore. |
| SOCIAL_SECURITY_NUMBER | Social Security Number used to identify individuals. |
| STATE | State information used to identify geographic locations. |
| STREET | Street address used to identify specific locations. |
| TIME | Specific time that can be linked to personal activities. |
| TITLE | Title or honorific used to identify individuals. |
| UK_NHS | National Health Service number used to identify individuals for healthcare services in the United Kingdom. |
| URL | Web address that can sometimes contain personal information. |
| US_BANK_NUMBER | Bank account number used to identify financial accounts in the United States. |
| US_DRIVER_LICENSE | Driver’s license number used to identify individuals in the United States. |
| US_ITIN | Individual Taxpayer Identification Number used to identify taxpayers in the United States. |
| US_PASSPORT | Passport number used to identify individuals in the United States. |
| US_SSN | Social Security Number used to identify individuals in the United States. |
| USERNAME | Username used to identify individuals in online systems. |
| ZIP_CODE | Postal code used to identify specific geographic areas. |
4 - Uninstalling Developer Edition
Open a command prompt.
Navigate to the cloned repository location.
Run the following command to remove the containers.
docker compose down --rmi allRun the following command to remove the Python module.
pip uninstall protegrity-developer-python==0.9.0