Pre-masking and post-masking transforms
Published 05 June 2025
There are sometimes cases where the data type of a column does not match that supported by a dataset. This means that the dataset cannot be applied to the column.
For example, the USSocialSecurityNumbers
dataset generates US Social Security numbers in the format 989-11-9389
. If the database column used to store this data is defined as a number, it cannot then store the data generated by the dataset when masking takes place.
To get around this problem, pre-masking and post-masking transforms can be used to manipulate the data into different formats.
Pre-masking transforms
Pre-masking transforms run after the data is loaded from the database, but before it is masked. This enables the original data to be transformed into a different form/type suitable for the applied dataset.
Post-masking transforms
Post-masking transforms run after the data is masked, but before it is written back to the database. This enables the masked data to be transformed into a different form/type suitable for the column being masked.
Configuring transforms
On a column
Pre-masking and post-masking transforms can be configured on a column in the masking.json
file. Both are optional, allowing a pre-masking transform or post-masking transform to be defined on their own, or both can be defined.
{ "tables": [ { "schema": "Person", "name": "Address", "columns": [ { "name": "SSN", "dataset": "USSocialSecurityNumbers", "transforms": { "pre": "...", "post": "...", } } ] } ] }
Transforms take the form of an expression that is used to manipulate the data, similar to how it is done within a cell in Microsoft Excel.
The special placeholder ${value}
is used to represent the input to the transform. For pre-masking transforms, this represents the value from the database column. For post-masking transforms, this represents the output of the dataset after masking.
Using a transform rule
Transform rules can also be defined in the masking.json
file or the options.json
file. These apply a similar syntax to classification rules, and allow conditions to be applied to determine whether to apply the transform to a given column.
{ "transformRules": [ { "condition": "Table.Name = 'Customers' AND Column.Name = 'PostalCode'", "pre": "removeWhitespace(${value})", "post": "upper(${value})" } ] }
Transform rules will apply the specified pre-masking and post-masking transforms to any columns that match the specified condition.
Both pre-masking and post-masking transforms are again optional, allowing a pre-masking transform or post-masking transform to be defined on their own, or both can be defined.
In the above example, the transforms will be applied to any columns called "PostalCode" in any table called "Customers".
In addition to the conditions used by classification rules, transform rules can also be about:
- a column's classification -
Column.Classification = 'GivenNames'
(only relevant when running the map step) - a column's dataset -
Column.Dataset = 'GivenNames'
(only relevant when running the mask step)
Data types
Expressions are type-safe, meaning that different data types cannot be interchanged. The supported data types are:
Data type | Description | Example |
---|---|---|
character | A single character. e.g. a letter, a numerical digit, a space, a punctuation mark. | 'x' |
string | A series of characters. | "hello" |
decimal | A floating point numeric. e.g. 12.345, 0.0012 | 123.45 |
integer | An integral numeric. e.g. 123, 10000 | 12345 |
date | A date. e.g. 22 June 2012 Dates need to be defined as | DATE(2012/06/22) |
NULL | A null value. | NULL |
If you do need to interchange data types, use one of the conversions functions listed below.
Characters are quoted using single quotes. e.g. 'a'
Strings are quoted using double quotes. e.g. "hello"
Supported Functions
String functions
Function | Description | Input(s) | Return type | Example |
---|---|---|---|---|
upper | Converts a string to uppercase. | string | string | upper("hello") → HELLO |
lower | Converts a string to lowercase. | string | string | lower("HELLO") → hello |
removeWhitespace | Removes all whitespace characters from a string. | string | string | removeWhitespace("some thing") → something |
trim | Removes all whitespace characters from the start and end of a string. | string | string | trim(" hello ") → hello |
trimStart | Removes all leading whitespace* characters from the start of a string. | string | string | trimStart(" hello") → hello |
trimEnd | Removes all trailing whitespace* characters from the end of a string. | string | string | trimEnd("hello ") → hello |
padLeft | Pads the start of a string with the specified character to the specified length. | string, integer, character | string | padLeft("hello", 3, 'x') → xxxhello |
padRight | Pads the end of a string with the specified character to the specified length. | string, integer, character | string |
|
replace | Replaces all occurrences of a specified string with another. | string, string, string | string | replace("hello", "ll", "xx") → hexxo |
substring | Returns a substring, starting at the specified starting position. If a length is specified, will stop after that number of characters. | string, integer string, integer, integer | string |
|
left | Returns the specified number of characters from the start of a string. | string, integer | string | left("hello", 2) → he |
right | Returns the specified number of characters from the end of a string. | string, integer | string | right("hello", 2) → lo |
* Whitespace characters include spaces, tabs (\t
), new lines (\n
) and carriage returns (\r
).
Conversion functions
Function | Description | Input(s) | Return type | Example |
---|---|---|---|---|
toString | Converts a value to a string. | (string | decimal | integer | date), string* | string | toString(123.45, "0.0") → "123.4" |
toDecimal | Converts a value to a decimal. | string | decimal | toDecimal("123.45") → 123.45 |
toInteger | Converts a value to an integer. | string | integer | toInteger("123") → 123 |
toDate | Converts a value to a date. | string, string* | date | toDate("23012025", "ddMMyyyy") → 23/1/2025 |
* A format string
Format strings
The format strings passed to toString
and toDate
determine the format of the converted value. These use the standard format patterns for numbers and dates.
Date functions
Function | Description | Input(s) | Return type | Example |
---|---|---|---|---|
day | Returns the day part of a date. | date | integer | day(date(23/1/2025)) → 23 |
month | Returns the month part of a date. | date | integer | month(date(23/1/2025)) → 1 |
year | Returns the year part of a date. | date | integer | year(date(23/1/2025)) → 2025 |
Null functions
Function | Description | Input(s) | Return type | Example |
---|---|---|---|---|
ifNull | Returns a value if the input value is NULL, otherwise returns the input value. | any type, any type | any type | ifNull(NULL, "hello") → "hello" |
nullIf | Returns NULL is the input value matches the specified value, otherwise returns the input value. | any type, any type | any type | nullIf("hello", "hello") → NULL |
nullIfEmpty | Returns NULL if the input string contains only whitespace characters (include spaces, tabs (\t ), new lines (\n ) and carriage returns (\r )). | string | string or NULL | nullIfEmpty(" ") → NULL |
Syntax errors
When working with transforms, it is possible to get a syntax error. This can be due to a number of reasons:
Cause | How to resolve |
---|---|
invalid syntax | Use the correct syntax, taking care to ensure that all opening parentheses are closed. |
incompatible data type | Ensure that the expected data type is being passed to a function, using a conversion function if necessary. |
When using strings in an expression in a masking.json
file, it is necessary to escape the double quotes surrounding the string:
{ "name": "SSN", "dataset": "USSocialSecurityNumbers", "transforms": { "pre": "replace(${value}, \"-\", \"\")" } }
Example expressions
Expression | Description | Data conversion |
---|---|---|
toDecimal(trim(${value})) | Removes whitespace from both ends of the value, then converts it to a decimal. | " 20.013 " → 20.013 |
replace(removeWhitespace(${value}), "-", "") | Removes all whitespace and hyphens from a string. | "555 123-4567" → "5551234567" |