Customize classification
Published 03 November 2024
We're going to customize classification to do the following:
A. Control which classification rules are run
B. Add a custom classification rule
C. Add a custom classification
D. Exclude a table
E. Exclude a column
These customizations will be done using the options file passed to the classification.
A. Control which classification rules are run
- {
- "classifications": {
- "builtIn": {
- "enabled": [ "FullNames" ],
- "disabled": [
- "Cities",
- "Countries",
- "CreditCardNumbers",
- "DatesOfBirth",
- "EmailAddresses",
- "FamilyNames",
- "GivenNames",
- "IPAddresses",
- "PassportNumbers",
- "PhoneNumbers",
- "PostCodes",
- "StreetAddresses",
- "StreetAddresses",
- "UKCounties",
- "UKNationalInsuranceNumbers",
- "USSocialSecurityNumbers",
- "ZipCodes"
- ]
- }
- }
- }
This turns off all built-in rules except for "FullNames".
B. Add a custom classification rule
- {
- "classifications": {
- "custom": [
- {
- "type": "CompanyNames",
- "confidence": "High",
- "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
- }
- ]
- }
- }
This creates a custom classification rule that will identify any columns containing both the words Company
and Name
as a "CompanyName".
C. Add a custom classification
- {
- "tables": [
- {
- "schema": "dbo",
- "name": "Suppliers",
- "columns": [
- {
- "name": "HomePage",
- "type": "Websites",
- "preserveNulls": true,
- "maxLength": 20
- }
- ]
- }
- ]
- }
This classifies the HomePage
column of the dbo.Suppliers
table as "Websites".
D. Exclude a table
- {
- "tables": [
- {
- "schema": "dbo",
- "name": "CustomerDemographics",
- "exclude": true
- }
- ]
- }
This prevents any columns in the dbo.CustomerDemographics
tables from being classified.
E. Exclude a column
- {
- "tables": [
- {
- "schema": "dbo",
- "name": "Shippers",
- "columns": [
- {
- "name": "Phone",
- "exclude": true
- }
- ]
- }
- ]
- }
This prevents the Phone
column in the dbo.Shippers
table from being classified.
classification-options.json
Putting the configurations for A-E above together into one options file produces the following:
- {
- "classifications": {
- "builtIn": {
- "enabled": [ "FullNames" ],
- "disabled": [
- "Cities",
- "Countries",
- "CreditCardNumbers",
- "DatesOfBirth",
- "EmailAddresses",
- "FamilyNames",
- "GivenNames",
- "IPAddresses",
- "PassportNumbers",
- "PhoneNumbers",
- "PostCodes",
- "StreetAddresses",
- "StreetAddresses",
- "UKCounties",
- "UKNationalInsuranceNumbers",
- "USSocialSecurityNumbers",
- "ZipCodes"
- ]
- },
- "custom": [
- {
- "type": "CompanyNames",
- "confidence": "High",
- "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
- }
- ]
- },
- "tables": [
- {
- "schema": "dbo",
- "name": "CustomerDemographics",
- "exclude": true
- },
- {
- "schema": "dbo",
- "name": "Shippers",
- "columns": [
- {
- "name": "Phone",
- "exclude": true
- }
- ]
- },
- {
- "schema": "dbo",
- "name": "Suppliers",
- "columns": [
- {
- "name": "HomePage",
- "type": "Websites",
- "preserveNulls": true,
- "maxLength": 20
- }
- ]
- }
- ]
- }
This should be saved into a file called classification-options.json
, and will be passed into the classification step when it is run.