Customize classification
Published 03 November 2024
We're going to customize classification to do the following:
A. Control which classification rules are run
B. Add a custom classification rule
C. Add a custom classification
D. Exclude a table
E. Exclude a column
These customizations will be done using the options file passed to the classification.
A. Control which classification rules are run
{ "classifications": { "builtIn": { "enabled": [ "FullNames" ], "disabled": [ "Cities", "Countries", "CreditCardNumbers", "DatesOfBirth", "EmailAddresses", "FamilyNames", "GivenNames", "IPAddresses", "PassportNumbers", "PhoneNumbers", "PostCodes", "StreetAddresses", "StreetAddresses", "UKCounties", "UKNationalInsuranceNumbers", "USSocialSecurityNumbers", "ZipCodes" ] } } }
This turns off all built-in rules except for "FullNames".
B. Add a custom classification rule
{ "classifications": { "custom": [ { "type": "CompanyNames", "confidence": "High", "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'" } ] } }
This creates a custom classification rule that will identify any columns containing both the words Company
and Name
as a "CompanyName".
C. Add a custom classification
{ "tables": [ { "schema": "dbo", "name": "Suppliers", "columns": [ { "name": "HomePage", "type": "Websites", "preserveNulls": true, "maxLength": 20 } ] } ] }
This classifies the HomePage
column of the dbo.Suppliers
table as "Websites".
D. Exclude a table
{ "tables": [ { "schema": "dbo", "name": "CustomerDemographics", "exclude": true } ] }
This prevents any columns in the dbo.CustomerDemographics
tables from being classified.
E. Exclude a column
{ "tables": [ { "schema": "dbo", "name": "Shippers", "columns": [ { "name": "Phone", "exclude": true } ] } ] }
This prevents the Phone
column in the dbo.Shippers
table from being classified.
classification-options.json
Putting the configurations for A-E above together into one options file produces the following:
{ "classifications": { "builtIn": { "enabled": [ "FullNames" ], "disabled": [ "Cities", "Countries", "CreditCardNumbers", "DatesOfBirth", "EmailAddresses", "FamilyNames", "GivenNames", "IPAddresses", "PassportNumbers", "PhoneNumbers", "PostCodes", "StreetAddresses", "StreetAddresses", "UKCounties", "UKNationalInsuranceNumbers", "USSocialSecurityNumbers", "ZipCodes" ] }, "custom": [ { "type": "CompanyNames", "confidence": "High", "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'" } ] }, "tables": [ { "schema": "dbo", "name": "CustomerDemographics", "exclude": true }, { "schema": "dbo", "name": "Shippers", "columns": [ { "name": "Phone", "exclude": true } ] }, { "schema": "dbo", "name": "Suppliers", "columns": [ { "name": "HomePage", "type": "Websites", "preserveNulls": true, "maxLength": 20 } ] } ] }
This should be saved into a file called classification-options.json
, and will be passed into the classification step when it is run.