Writing classification rule conditions
Published 15 December 2023
Conditions must be written as a single line piece of text in the "condition"
property of a "custom"
rule.
Example to add a custom classification rule with a long condition Toggle source code
- {
- "classifications": {
- "custom": [
- {
- "type": "INSEE code",
- "confidence": "High",
- "condition": "Column.Type is string AND Column.Width >= 16 AND (Column.Name contains 'nir' OR Column.Name contains all of 'numéro','inscription','répertoire') AND Column.Name contains none of 'code','type'"
- }
- ]
- }
- }
Basic Conditions
A basic condition can be a statement about:
- a column type
- a column width
- a column name
- a table name
- a schema name (only relevant to SQL Server or PostgreSQL)
NB In the following there are sometimes alternative keywords that achieve the same. None of the keywords are case sensitive.
Column Type
A column type condition can check whether a column contains a particular type of data. At the moment only two types are supported string
(for any type of text column) and date
(for any type of date column).
Examples:
- Column.Type is string
- Column.Type is date
Column Width
A column width condition checks whether a column's width is a particular size.
Examples:
- Column.Width < 10
- Column.Width = 12
- Column.Width >= 6 (always true if column has no max width)
Schema/Table/Column Name
There are various conditions you can check that are related to the name of a schema/table/column.
All text must be in single quotes e.g. 'name'
. Single quotes can be escaped with a backslash \
e.g. 'numéro d\'inscription au répertoire'
(not) equal text
- Column.Name equals 'name'
- Table.Name not equals 'company'
(not) equals any of comma separated text
- Column.Name equals any of 'name', 'nm'
- Table.Name not equals any of 'company', 'business'
(not) contains text
- Schema.Name contains 'customer'
- Table.Name not contains 'product'
contains (all/any/none) of comma separated text
- Column.Name contains all of 'membership', 'number'
- Column.Name contains any of 'num', 'number', 'no'
- Column.Name contains none of 'code', 'type'
(not) starts with
- Column.Name starts with 'ABC_'
- Column.Name not starts with 'Customer_'
(not) ends with
- Column.Name ends with '_ABC'
- Column.Name not ends with '_Name'
Complex Conditions
Basic conditions can be combined into more complex conditions by joining them together with AND and OR, and wrapping groups of conditions in brackets
- Column.Type is string AND Column.Width > 7 AND ((Column.Name starts with 'Product_' AND Column.Name ends with '_Owner') OR (Column.Name starts with 'Produkt_' AND Column.Name ends with '_Inhaber'))
Recommendations
To avoid false positives :
- include basic conditions to exclude common words e.g.
Column.Name contains none of 'code', 'type'
- include basic conditions to specify the data type of the column e.g.
Column.Type is date
- include basic conditions to specify the width of the column e.g.
Column.Width => 7