Customize classification
Published 03 November 2024
We're going to customize classification to do the following:
A. Control which classification rules are run
B. Add a custom classification rule
C. Add a custom classification
D. Exclude a table
E. Exclude a column
These customizations will be done using the options file passed to the classification.
A. Control which classification rules are run
{
"classifications": {
"builtIn": {
"enabled": [ "FullNames" ],
"disabled": [
"Cities",
"Countries",
"CreditCardNumbers",
"DatesOfBirth",
"EmailAddresses",
"FamilyNames",
"GivenNames",
"IPAddresses",
"PassportNumbers",
"PhoneNumbers",
"PostCodes",
"StreetAddresses",
"StreetAddresses",
"UKCounties",
"UKNationalInsuranceNumbers",
"USSocialSecurityNumbers",
"ZipCodes"
]
}
}
}This turns off all built-in rules except for "FullNames".
B. Add a custom classification rule
{
"classifications": {
"custom": [
{
"type": "CompanyNames",
"confidence": "High",
"condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
}
]
}
}This creates a custom classification rule that will identify any columns containing both the words Company and Name as a "CompanyName".
C. Add a custom classification
{
"tables": [
{
"schema": "dbo",
"name": "Suppliers",
"columns": [
{
"name": "HomePage",
"type": "Websites",
"preserveNulls": true,
"maxLength": 20
}
]
}
]
}This classifies the HomePage column of the dbo.Suppliers table as "Websites".
D. Exclude a table
{
"tables": [
{
"schema": "dbo",
"name": "CustomerDemographics",
"exclude": true
}
]
}This prevents any columns in the dbo.CustomerDemographics tables from being classified.
E. Exclude a column
{
"tables": [
{
"schema": "dbo",
"name": "Shippers",
"columns": [
{
"name": "Phone",
"exclude": true
}
]
}
]
}This prevents the Phone column in the dbo.Shippers table from being classified.
classification-options.json
Putting the configurations for A-E above together into one options file produces the following:
{
"classifications": {
"builtIn": {
"enabled": [ "FullNames" ],
"disabled": [
"Cities",
"Countries",
"CreditCardNumbers",
"DatesOfBirth",
"EmailAddresses",
"FamilyNames",
"GivenNames",
"IPAddresses",
"PassportNumbers",
"PhoneNumbers",
"PostCodes",
"StreetAddresses",
"StreetAddresses",
"UKCounties",
"UKNationalInsuranceNumbers",
"USSocialSecurityNumbers",
"ZipCodes"
]
},
"custom": [
{
"type": "CompanyNames",
"confidence": "High",
"condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
}
]
},
"tables": [
{
"schema": "dbo",
"name": "CustomerDemographics",
"exclude": true
},
{
"schema": "dbo",
"name": "Shippers",
"columns": [
{
"name": "Phone",
"exclude": true
}
]
},
{
"schema": "dbo",
"name": "Suppliers",
"columns": [
{
"name": "HomePage",
"type": "Websites",
"preserveNulls": true,
"maxLength": 20
}
]
}
]
}This should be saved into a file called classification-options.json, and will be passed into the classification step when it is run.
