Classification
Published 17 May 2024
Classification
Classification is the first step in anonymizing your database with Anonymize. The classify
operation scans your database to identify potentially sensitive information, often referred to as Personally Identifiable Information (PII).
It then outputs a classification JSON file that describes which tables and columns contain PII.
Default Classifications and Datasets
Anonymize comes with a predefined set of classification types and datasets designed to cover NIST's definition of linked information. By masking these types of data, you minimize the chance of any other parts of the record being identifiable as the individual.
If you need to assign classification types or datasets that aren't included in our defaults, check out the custom configuration page.
Classification File Structure
When you run the classify
command, it outputs a JSON file that outlines the tables and columns in your database. The file contains information about the schema, table names, column names, and the classified data types.
In most cases, you won't need to edit the classification file before using it as input for the map
command. But if you do need to make changes, as mentioned above, it's usually better to provide an options file:
- The classification and masking files are generated from scratch each time you run the
classify
ormap
commands, so any manual changes made to these files will be lost (unless you use version control). - The options file allow you to store your anonymization configuration separately, making it easier to manage and maintain your settings across multiple runs.
You can find more details on the custom configuration page.
Next Steps
- To learn more about the default classification types and datasets, head over to the Default classifications and datasets page.
- For a closer look at the classification file structure, check out the Classification file structure page.
- Once you have the output from the classify operation, you're ready to Map the classification output into a set of masking rules.
- If you need more info on the
classify
command, take a look at the command-line reference.