Using different or custom datasets
Published 14 June 2024
Choose a different dataset for masking a column
To override with a different dataset, add the column to your options file and specify a "dataset"
on the column.
{ "tables": [ { "schema": "Person", "name": "Address", "columns": [ { "name": "CountyOrState", "dataset": "USStates" } ] } ] }
Define a custom dataset for masking
To define a custom dataset for your column, you can choose between:
- Pattern-based
- List-based (specify a collection of values)
- File-based (use a .txt file containing a list of line-separated values. The .txt file must be in the same directory as the options file.)
Note: The "name" property of the dataset in your options file can be one of the built-in datasets (overriding that dataset), or a distinct name of your choice (creating an additional dataset). Any column in your masking file that is assigned your named dataset will be masked with the values you define.
Pattern
{ "datasets": [ { "name": "MiddleInitials", "type": "Pattern", "values": ["?.", "?. ?."] } ] }
List
{ "datasets": [ { "name": "ShortFirstNames", "type": "List", "values": [ "Ann", "Bob", "Carlos", "Dalip"] } ] }
File
{ "datasets": [ { "name": "NorthWestCities", "type": "File", "values": "NorthWestCities.txt" } ] }
NorthWestCities.txt
Bellevue Kennewick Pasco Portland Seattle Spokane Tacoma Yakima
Formatting of masked values
To enforce a particular format for alphanumeric strings, specify the format with a pattern using the following rules:
#
representing integers (0-9)?
representing characters (A-Z)*
representing either integers or characters\
to escape pattern characters
{ "datasets": [ { "name": "PhoneNumbers", "type": "Pattern", "values": [ "(###) ###-####", "(###) ###-#### x####", "1-###-###-####", "###-###-####" ] } ] }
Note: In this options file example, we are overriding the built-in PhoneNumbers
dataset. Any column assigned the PhoneNumbers
dataset will be masked with values that match one of the patterns specified in the values array. For all the values to have the same format, specify only one pattern in your values array.
Preserving null data values
By default, null data values will be preserved when masking. To turn this behaviour off, add "preserveNull": false
to the corresponding JSON node.
{ "tables": [ { "schema": "Person", "name": "PersonPhone", "preserveNulls": false } ] }
{ "tables": [ { "schema": "Person", "name": "PersonPhone", "columns": [ { "name": "PhoneNumber", "preserveNulls": false } ] } ] }