Classification rules
Published 15 December 2023
Introduction to custom classification rules
Anonymize Classify uses built-in classification rules to automatically identify columns containing Personally Identifiable Information (PII) in your database. However, you may want to create custom classification rules to improve the classification results based on your specific database schema.
You might want to write a custom classification rule if:
- Your database has a specific naming pattern for columns that classification doesn't recognize by default.
- You want to add a custom classification that isn't included in classification's default ruleset.
Custom classification rules consist of three main components:
type
- the classification you want to apply to columns that match the rule.confidence
- Indicates how confident the classification is. You can set it to "High", "Medium", or "Low".condition
: - specifies which columns the rule should apply to.
In the following sections, we'll cover how to add custom classification rules, how they behave, and how to write effective rule conditions.
Where to configure classification rules
Classification rules are configured in the "classifications"
section of the options file.
See the options file page for how to get intellisense help from when writing the options file.
Adding a custom classification rule
To configure a custom classification rule, add the rule to the "custom"
array.
Example options file to add a custom classification rule Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "GivenNames",
- "confidence": "High",
- "condition": "Column.Name contains 'prenom'"
- }
- ]
- }
- }
Type
This is the classification that the rule will apply to columns that match the condition. This can be the same as an built-in rule (see the list of default classification types here)
Confidence
This is how confident Anonymize Classify
should be that this rule has found the correct classification.
See Rule Precedence for more details about how this affects the behaviour of the rule.
Condition
This is the condition used to determine if the rule should apply the classification to a column.
See "How to write a custom classification rule condition" for more details about how this affects the behaviour of the rule.
Disabling a built-in rule
To disable an built-in rule, add the classification type to the "disabled"
section of "builtIn"
.
Example to disable a built-in rule Toggle source code
- {
- "classifications": {
- "builtIn": {
- "disabled": [
- "GivenNames"
- ]
- }
- }
- }
Only enabling certain built-in rules
To enable only an explicit set of built-in rules (and disable all the others), add the classification type to the "enabled"
section of "builtIn"
.
Example to enable only certain built-in rules Toggle source code
- {
- "classifications": {
- "builtIn": {
- "enabled": [
- "GivenNames",
- "StreetAddresses",
- "EmailAddresses"
- ]
- }
- }
- }
Extending an existing classification type
If your database contains lots of columns with abbreviations or non-english naming conventions then you may wish to write a rule that can classify these as one of the built-in classifications
Example to add a custom classification rule for a german language column Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "GivenNames",
- "confidence": "High",
- "condition": "Column.Name contains 'Vorname'"
- }
- ]
- }
- }
Example to add a custom classification rule for an abbreviated name column Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "GivenNames",
- "confidence": "High",
- "condition": "Column.Name contains 'fname'"
- }
- ]
- }
- }
Adding a new classification type
If your database contains lots of columns with the same type of PII, but it isn't classified by one of the default classification types, you can add a rule to classify those columns with a new classification type.
Example to add a custom classification rule for new classification type Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "DrivingLicenseNumber",
- "confidence": "High",
- "condition": "Column.Name contains all of 'driving', 'license', 'number'"
- }
- ]
- }
- }
General rule behaviour
Built-in rules
Anonymize Classify
comes with a number of built-in rules that will be used out-of the box (see the list of default classification types here). These will be all be enabled by default but can be disabled. See the section above on how to disable a built-in rule.
Custom rules are additive
Each rule that is added will applied in addition to any that already exist. This means that simply adding a new rule won't disable an existing rule. See Rule Precedence for how rules take precedence over one another.
Rule precedence
Rule precedence is affected by the following factors:
- Whether the rule is disabled
- The confidence of the rule
- Whether it is a built-in rule (and which one)
- The order the rule is defined
If two or more rules classify the same column differently, then the active rule with the highest precedence will apply its classification type to the column,
Active
Only rules that aren't disabled can apply their classification to a column. Only the built-in rules can be disabled. Custom rules would need to be removed to "disable" them.
See sections above for how to disable a built in rule.
For example, in the following example the custom "USCounties" rule would take precedence over the built-in "UKCounty" rule, if there was a column called "County". Indeed, nothing would ever be classfied as "UKCounty" here, as the built-in "UKCounty" rule is disabled.
Example to disable and replace a built-in rule Toggle source code
- {
- "classifications": {
- "builtIn": {
- "disabled": [
- "UKCounties"
- ]
- },
- "custom": [
- {
- "type": "USCounties",
- "confidence": "High",
- "condition": "Column.Name contains 'County'"
- }
- ]
- }
- }
Confidence
The three levels of confidence for a rule is High, Medium and Low.
If two or more rules classify the same column differently, then those rules with higher confidence will take precedence.
For example, in the following example the "Car Insurance Number" rule would take precedence over the "Insurance Number" rule, if there was a column called "Car_Insurance_Num".
Example adding multiple custom classification rules with different confidences Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "Insurance Number",
- "confidence": "Low",
- "condition": "Column.Name contains 'Insurance'"
- },
- {
- "type": "Car Insurance Number",
- "confidence": "High",
- "condition": "Column.Name contains all of 'Car', 'Insurance'"
- }
- ]
- }
- }
Built-In
For rules of the same confidence, all built-in rules take precedence over all custom rules.
Built-in rules have a precedence amongst themselves.
For example, in the following example the built-in "UKCounty" rule would take precedence over the custom "USCounties" rule, if there was a column called "County". To make the "USCounties" apply to the column, the "UKCounty" rule would have to be disabled.
Example to add a low precedence custom classification rule Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "USCounties",
- "confidence": "High",
- "condition": "Column.Name contains 'County'"
- }
- ]
- }
- }
Definition order
Custom rules have precedence order amongst themselves too.
The first rule defined has highest precedence and so on until the last rule defined which has lowest precedence.
For example, in the following example the "Car Insurance Number" rule would take precedence over the "Insurance Number" rule, if there was a column called "Car_Insurance_Num".
Example to add multiple custom classification rules in a particular order Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "Car Insurance Number",
- "confidence": "High",
- "condition": "Column.Name contains all of 'Car', 'Insurance'"
- },
- {
- "type": "Insurance Number",
- "confidence": "High",
- "condition": "Column.Name contains 'Insurance'"
- }
- ]
- }
- }