About Substitution Rules
Published 22 March 2018
Substitution rules are designed to replace data in table columns with realistic looking but random non-meaningful data. The choice of the replacement data is configurable in the rule by associating datasets with the target columns. It is possible to configure a Substitution rule to mask multiple columns in the same table and each column can be configured with its own dataset.
For example, a column containing customer last names could be masked by implementing a Substitution rule on it using the Names, Surnames dataset. When the Substitution rule is executed as part of the run of the masking set, random last names would be generated and substituted in place of each real customer last name. Thus the true last name of the customer would be hidden (preserving privacy and security) but the remaining data would still be referentially relevant and usable as a test system.
Once the rule has begun to execute, the substitution continues until all rows in the table (or a subset if a Where clause or Sampling options are specified) are updated with the new data. Commits happen at user configurable intervals (every 1000 rows is the default).
There can be any number of Substitution rules on any columns in any table in a database. If you apply a Substitution rule to a column that is used in a primary key or unique index then the index might have to be dropped while the Substitution rule is executing. The uniqueness of the substituted data is highly dependent on the type of dataset chosen for substitution. Some datasets have options to guarantee uniqueness and some do not.
Datasets for just about every purpose are included with the Data Masker software and you can make up your own if you need to do so. The choice of dataset used for a particular column is entirely up to the implementer of the Substitution rule. It is quite possible to choose a non appropriate dataset. For example, putting telephone numbers into a last name field. The Data Masker software performs no checks as to the "appropriateness" of the dataset for the field contents.
The Data Masker software does, however, perform a number of other checks to prevent errors at rule execution time. When building a Substitution rule, the datasets available for selection are restricted by datatype. For example, it is not possible to substitute last names into a column with a Number datatype.
For textual fields such as VARCHAR columns, the size of the data supplied by the dataset is restricted to the width of the column. A Substitution rule will never, for example, attempt to update a VARCHAR(20) column with 25 characters of substitution data.
By default, the datasets are located in a directory named DataSets immediately below the Data Masker installation directory. This location can be changed using the options on the Misc. Setup Tab.
How to Create a New Substitution rule