Redgate Test Data Manager

Writing Classification Rule Conditions

Conditions must be written as a single line piece of text in the "condition" property of a "custom" rule.


Example options file to add a custom classification rule with a long condition

"classifications": {
  "custom": [
    {
      "type": "INSEE code",
      "confidence": "High",
      "condition": "Column.Type is string AND Column.Width >= 16 AND (Column.Name contains 'nir' OR Column.Name contains all of 'numéro','inscription','répertoire') AND Column.Name contains none of 'code','type'"
    }
  ]
}

Basic Conditions

A basic condition can be a statement about:

  • a column type
  • a column width
  • a column name
  • a table name
  • a schema name
    • (only relevant to database engines where Anonymize Classify classifies multiple schemas in a database at once, such as SQL Server or PostgreSQL)


NB In the following there are sometimes alternative keywords that achieve the same. Also, none of the keywords are case sensitive.

Column Type

A column type condition can check whether a column contains a particular type of data. At the moment only two types are supported string (for any type of text column) and date (for any type of date column).

Examples:

  • Column.Type is string
  • Column.Type is date

Column Width

A column width condition checks whether a column's width is a particular size.

Examples:

  • Column.Width < 10
  • Column.Width = 12
  • Column.Width >= 6 (always true if column has no max width)

Column/Table/Schema Name

There are various conditions you can check that are related to the name of a column/table/schema.

All pieces of text must be in single quotes e.g. 'name' . Single quotes can be escaped with a backslash \ e.g. 'numéro d\'inscription au répertoire'

The following conditions are possible:

  • (not) equal text
    • Column.Name equals 'name'
    • Table.Name not equals 'company'
  • (not) contains text
    • Schema.Name contains 'customer'
    • Table.Name not contains 'product'
  • contains (all/any/none) of comma separated text
    • Column.Name contains all of 'membership', 'number' 
    • Column.Name contains any of 'num', 'number', 'no'
    • Column.Name contains none of 'code', 'type'
  • (not) starts with
    • Column.Name starts with 'ABC_' 
    • Column.Name not starts with 'Customer_' 
  • (not) ends with
    • Column.Name ends with '_ABC' 
    • Column.Name not ends with '_Name'

Complex Conditions

Basic conditions can be combined into more complex conditions by joining them together with AND and OR, and wrapping groups of conditions in brackets

e.g.

Column.Type is string AND Column.Width > 7 AND ((Column.Name starts with 'Product_' AND Column.Name ends with '_Owner') OR (Column.Name starts with 'Produkt_' AND Column.Name ends with '_Inhaber'))

Recommendations

To avoid false positives :

  • include basic conditions to exclude common words e.g. Column.Name contains none of 'code', 'type'
  • include basic conditions to specific the data type of the column e.g. Column.Type is date
  • include basic conditions to specific the width of the column e.g. Column.Width => 7

Didn't find what you were looking for?