Redgate Test Data Manager

Customize classification

We're going to customize classification to do the following:

A. Control which classification rules are run

B. Add a custom classification rule

C. Add a custom classification

D. Exclude a table

E. Exclude a column


These customizations will be done using the options file passed to the classification.


A. Control which classification rules are run

  1. {
  2. "classifications": {
  3. "builtIn": {
  4. "enabled": [ "FullNames" ],
  5. "disabled": [
  6. "Cities",
  7. "Countries",
  8. "CreditCardNumbers",
  9. "DatesOfBirth",
  10. "EmailAddresses",
  11. "FamilyNames",
  12. "GivenNames",
  13. "IPAddresses",
  14. "PassportNumbers",
  15. "PhoneNumbers",
  16. "PostCodes",
  17. "StreetAddresses",
  18. "StreetAddresses",
  19. "UKCounties",
  20. "UKNationalInsuranceNumbers",
  21. "USSocialSecurityNumbers",
  22. "ZipCodes"
  23. ]
  24. }
  25. }
  26. }

This turns off all built-in rules except for "FullNames".

B. Add a custom classification rule

  1. {
  2. "classifications": {
  3. "custom": [
  4. {
  5. "type": "CompanyNames",
  6. "confidence": "High",
  7. "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
  8. }
  9. ]
  10. }
  11. }

This creates a custom classification rule that will identify any columns containing both the words Company and Name as a "CompanyName".

C. Add a custom classification

  1. {
  2. "tables": [
  3. {
  4. "schema": "dbo",
  5. "name": "Suppliers",
  6. "columns": [
  7. {
  8. "name": "HomePage",
  9. "type": "Websites",
  10. "preserveNulls": true,
  11. "maxLength": 20
  12. }
  13. ]
  14. }
  15. ]
  16. }

This classifies the HomePage column of the dbo.Suppliers table as "Websites".

D. Exclude a table

  1. {
  2. "tables": [
  3. {
  4. "schema": "dbo",
  5. "name": "CustomerDemographics",
  6. "exclude": true
  7. }
  8. ]
  9. }

This prevents any columns in the dbo.CustomerDemographics tables from being classified.

E. Exclude a column

  1. {
  2. "tables": [
  3. {
  4. "schema": "dbo",
  5. "name": "Shippers",
  6. "columns": [
  7. {
  8. "name": "Phone",
  9. "exclude": true
  10. }
  11. ]
  12. }
  13. ]
  14. }

This prevents the Phone column in the dbo.Shippers table from being classified.


classification-options.json

Putting the configurations for A-E above together into one options file produces the following:

  1. {
  2. "classifications": {
  3. "builtIn": {
  4. "enabled": [ "FullNames" ],
  5. "disabled": [
  6. "Cities",
  7. "Countries",
  8. "CreditCardNumbers",
  9. "DatesOfBirth",
  10. "EmailAddresses",
  11. "FamilyNames",
  12. "GivenNames",
  13. "IPAddresses",
  14. "PassportNumbers",
  15. "PhoneNumbers",
  16. "PostCodes",
  17. "StreetAddresses",
  18. "StreetAddresses",
  19. "UKCounties",
  20. "UKNationalInsuranceNumbers",
  21. "USSocialSecurityNumbers",
  22. "ZipCodes"
  23. ]
  24.    },
  25. "custom": [
  26. {
  27. "type": "CompanyNames",
  28. "confidence": "High",
  29. "condition": "Column.Name contains 'Company' AND Column.Name contains 'Name'"
  30. }
  31. ]
  32. },
  33. "tables": [
  34. {
  35. "schema": "dbo",
  36. "name": "CustomerDemographics",
  37. "exclude": true
  38. },
  39.   {
  40. "schema": "dbo",
  41. "name": "Shippers",
  42. "columns": [
  43. {
  44. "name": "Phone",
  45. "exclude": true
  46. }
  47. ]
  48. },
  49. {
  50. "schema": "dbo",
  51. "name": "Suppliers",
  52. "columns": [
  53. {
  54. "name": "HomePage",
  55. "type": "Websites",
  56. "preserveNulls": true,
  57. "maxLength": 20
  58. }
  59. ]
  60. }
  61. ]
  62. }

This should be saved into a file called classification-options.json, and will be passed into the classification step when it is run.



Didn't find what you were looking for?