Writing Classification Rule Conditions
Published 15 December 2023
Conditions must be written as a single line piece of text in the "condition"
property of a "custom"
rule.
Example options file to add a custom classification rule with a long condition
"classifications": { "custom": [ { "type": "INSEE code", "confidence": "High", "condition": "Column.Type is string AND Column.Width >= 16 AND (Column.Name contains 'nir' OR Column.Name contains all of 'numéro','inscription','répertoire') AND Column.Name contains none of 'code','type'" } ] }
Basic Conditions
A basic condition can be a statement about:
- a column type
- a column width
- a column name
- a table name
- a schema name
- (only relevant to database engines where Anonymize Classify classifies multiple schemas in a database at once, such as SQL Server or PostgreSQL)
NB In the following there are sometimes alternative keywords that achieve the same. Also, none of the keywords are case sensitive.
Column Type
A column type condition can check whether a column contains a particular type of data. At the moment only two types are supported string
(for any type of text column) and date
(for any type of date column).
Examples:
Column.Type is string
Column.Type is date
Column Width
A column width condition checks whether a column's width is a particular size.
Examples:
Column.Width < 10
Column.Width = 12
Column.Width >= 6 (always true if column has no max width)
Column/Table/Schema Name
There are various conditions you can check that are related to the name of a column/table/schema.
All pieces of text must be in single quotes e.g. 'name'
. Single quotes can be escaped with a backslash \
e.g. 'numéro d\'inscription au répertoire'
The following conditions are possible:
- (not) equal text
Column.Name equals 'name'
Table.Name not equals 'company'
- (not) contains text
Schema.Name contains 'customer'
Table.Name not contains 'product'
- contains (all/any/none) of comma separated text
Column.Name contains all of 'membership', 'number'
Column.Name contains any of 'num', 'number', 'no'
Column.Name contains none of 'code', 'type'
- (not) starts with
Column.Name starts with 'ABC_'
Column.Name not starts with 'Customer_'
- (not) ends with
Column.Name ends with '_ABC'
Column.Name not ends with '_Name'
Complex Conditions
Basic conditions can be combined into more complex conditions by joining them together with AND and OR, and wrapping groups of conditions in brackets
e.g.
Column.Type is string AND Column.Width > 7 AND ((Column.Name starts with 'Product_' AND Column.Name ends with '_Owner') OR (Column.Name starts with 'Produkt_' AND Column.Name ends with '_Inhaber'))
Recommendations
To avoid false positives :
- include basic conditions to exclude common words e.g.
Column.Name contains none of 'code', 'type'
- include basic conditions to specific the data type of the column e.g.
Column.Type is date
- include basic conditions to specific the width of the column e.g.
Column.Width => 7