Redgate Test Data Manager

Customize classification

We're going to customize classification to do the following:

A. Control which classification rules are run

B. Add a custom classification rule

C. Add a custom classification

D. Exclude a table

E. Exclude a column


These customizations will be done using the options file passed to the classification.


A. Control which classification rules are run

{
  "classifications": {
    "builtIn": {
      "enabled": [ "FullNames" ],
      "disabled": [ 
        "Cities",
         "Countries",
         "CreditCardNumbers",
         "DatesOfBirth",
         "EmailAddresses",
         "FamilyNames",
         "GivenNames",
         "IPAddresses",
         "PassportNumbers",
         "PhoneNumbers",
         "PostCodes",
         "StreetAddresses",
         "StreetAddresses",
         "UKCounties",
         "UKNationalInsuranceNumbers",
         "USSocialSecurityNumbers",
         "ZipCodes"
       ]
    }
  }
}

This turns off all built-in rules except for "FullNames".

B. Add a custom classification rule

{
  "classifications": {
    "custom": [
      {
        "type": "CompanyNames",
        "confidence": "High",
        "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
      }
    ]
  }
}

This creates a custom classification rule that will identify any columns containing both the words Company and Name as a "CompanyName".

C. Add a custom classification

{
  "tables": [
    {
      "schema": "dbo",
      "name": "Suppliers",
      "columns": [
        {
          "name": "HomePage",
          "type": "Websites",
          "preserveNulls": true,
          "maxLength": 20
        }
      ]
    }
  ]
}

This classifies the HomePage column of the dbo.Suppliers table as "Websites".

D. Exclude a table

{
  "tables": [
    {
      "schema": "dbo",
      "name": "CustomerDemographics",
      "exclude": true
    }
  ]
}

This prevents any columns in the dbo.CustomerDemographics tables from being classified.

E. Exclude a column

{
  "tables": [
    {
      "schema": "dbo",
      "name": "Shippers",
      "columns": [
        {
          "name": "Phone",
          "exclude": true
        }
      ]
    }
  ]
}

This prevents the Phone column in the dbo.Shippers table from being classified.


classification-options.json

Putting the configurations for A-E above together into one options file produces the following:

{
  "classifications": {
    "builtIn": {
      "enabled": [ "FullNames" ],
       "disabled": [ 
        "Cities",
         "Countries",
         "CreditCardNumbers",
         "DatesOfBirth",
         "EmailAddresses",
         "FamilyNames",
         "GivenNames",
         "IPAddresses",
         "PassportNumbers",
         "PhoneNumbers",
         "PostCodes",
         "StreetAddresses",
         "StreetAddresses",
         "UKCounties",
         "UKNationalInsuranceNumbers",
         "USSocialSecurityNumbers",
         "ZipCodes"
       ]
    },
    "custom": [
      {
        "type": "CompanyNames",
        "confidence": "High",
        "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
      }
    ]
  },
  "tables": [
    {
      "schema": "dbo",
      "name": "CustomerDemographics",
      "exclude": true
    },
    {
      "schema": "dbo",
      "name": "Shippers",
      "columns": [
        {
          "name": "Phone",
          "exclude": true
        }
      ]
    },
    {
      "schema": "dbo",
      "name": "Suppliers",
      "columns": [
        {
          "name": "HomePage",
          "type": "Websites",
          "preserveNulls": true,
          "maxLength": 20
        }
      ]
    }
  ]
}

This should be saved into a file called classification-options.json, and will be passed into the classification step when it is run.



Didn't find what you were looking for?