Data Masker

A masking set consists of the rule which defines how to invoke some change on your database. Each type of rule has a different purpose which makes it suitable for a specific masking requirement. They need to be defined in the controller which is rule container on certain SQL connection.

You can add a rule through the Rules tab by pressing New Rule button.

How to create a New Masking Rule

Selector form

Rules description

Masking Rules

Substitution Rules Substitutes the data in the column of a table. As substitution data this type of rule can use any of the supplied datasets or User Defined Datasets appropriate to the column type. This type of rule can also substitute based on a user supplied Where condition.

Shuffle Rules Shuffles the data in the column of a table (like a deck of cards) and leaves the other columns untouched. This type of rule can also shuffle based on a user supplied WHERE condition.

Insertion Rules Inserts new rows into table columns. This type of rule can use as insertion data any of the available datasets appropriate to the column type.

Search and Replace Rules Search and replace values in free format text fields. This type of rule can use four different search operations, and replace with values from available datasets. A user supplied Where condition can also be applied.

XML Masker Rules Mask data stored in XML format. This type of rule can search XML contents by XPath and replace with values from available datasets. A user supplied Where condition can also be applied.

JSON Masker Rules Mask data stored in JSON format. This type of rule can search JSON contents by JSONPath and replace with values from available datasets. A user supplied Where condition can also be applied.

Overview of Masking Rules at Redgate University

The Masking Rules module demonstrates where in the GUI you can find each of these different types of rules, and how to use them to mask data in the columns in a table, or move it around, so that it is protected.

Synchronization Rules

Synchronization rules ensure that scrambled data correlates (or synchronizes) with other data. Synchronization rules are necessary because it is very rare for database information to be stored in a fully normalized way. Usually, there is a requirement for data masked in one area to be masked in an identical way in another area. For example, an employee name may be held in several tables. It is desirable (usually essential) that if the name is masked in one column then the other tables in which the information is held are also updated with an identical value. There are three basic types of synchronization and a specialized rule type to support each one.

Row-Internal Synchronization Rules A Row-Internal Synchronization Rule updates a field in a row with a combination of values from the same row.

Table-Internal Synchronization Rules A Table-Internal Synchronization Rule updates columns in groups of rows within a table to contain identical values.

Table-To-Table Synchronization Rules A Table-To-Table Synchronization Rule uses a join condition to update columns in another table to contain identical values.

Cross DB Table Mover Rules This type of rule is used to copy data from a source table to a target table in a different database, instance and database type.

Cross DB Table-to-Table Rules This type of rule is used to synchronise data between a source table and a target table in a different database, instance and database type.

Overview of Synchronization Rules at Redgate University

The Row-Internal, Table-Internal and Table-to-Table modules offer simple examples of how to define and run each of these types of rules, in turn.

Enable/Disable

Foreign Key Enable Rules Enables foreign keys in the target database. Each foreign key can be individually marked for enable. Usually used after a Foreign Key Disable Rule has run.

Foreign Key Disable Rules Disables foreign keys in the target database. Usually followed by a Foreign Key Enable Rule after the other masking rules have run.

Trigger Enable Rules Enables triggers in the target database. Each trigger can be individually marked for enable. Usually used after a Trigger Disable Rule has run.

Trigger Disable Rules Disables triggers in the target database. Usually followed by a Trigger Enable Rule after the other masking rules have run.

Index Enable Rules Enables indexes in the target database. Each index can be individually marked for enable. Usually used after an Index Enable Rule has run.

Index Disable Rules Disables indexes in the target database. Usually followed by an Index Enable Rule after the other masking rules have run.

Refresh Rules

Index Refresh Rules Dynamically download the indexes from the database and refresh the Index Manager rules in the Rule Controller.

Trigger Refresh Rules Dynamically download the triggers from the database and refresh the Trigger Manager rules in the Rule Controller.

FK Refresh Rules Dynamically download the foreign keys from the database and refresh the FK Manager rules in the Rule Controller.

Row Count Refresh Rules Dynamically download the count of rows in all of the tables in the Rule Controller schema from the database and will refresh the display.

Advanced

Command Rules This type of rule is used to run user-defined T-SQL or PL/SQL statements within the target database.

Cross DB Command Rules This type of rule is used to run user-defined T-SQL or PL/SQL statements in a different database, instance and database type.

The Masking Process

Once added to the masking set, the rule is ready to modify the data in the target database. However, no changes to the table contents will take place until the rules are executed by the Data Masker software. To run a masking set and execute the rules within it click on the Run Masking Set button on the right hand side of the main Data Masker display. It is also possible to only run one rule or a group of selected rules by right click the rule and run from the menu shown.

The Data Masker software can execute multiple rules simultaneously. Some rules will require other rules to complete before it is appropriate for them to begin. Read the Rule Blocks and Dependencies help page to understand how to explicitly control the execution order of the masking rules.

The progress of the masking set run and information about each rules state can be viewed on the Rule Statistics and Run Statistics tabs.

Operationally, rule execution is quite straight forward. The effect is exactly what the rule and its options state. For example, a Substitution rule using the Random Last Names dataset applied to the EMPLOYEE table on the EMPLOYEE_LASTNAME column would generate and substitute random last names in place of the existing last names. The substitution would continue until all rows in the table (or a subset if a Where clause option was specified) were updated with the new data.

Important Note: Once a rule has been run it is not possible to recover the previous data by running another rule. For example, once a Substitution rule has been run, the data will be thoroughly masked and there is no way of "un-substituting" it. To retrieve the original data the usual database restore procedures would have to be implemented.

Some Answers to Questions You Might Have

I want the masked data to look similar to the original data.

This is most likely a substitution operation and a Substitution Rule is usually appropriate. It is important to note that Data Masker datasets are not just simple lists of replacement information. They provide a wide range of functionality. For example, the Null Values dataset removes data by replacing it with null values, the Paragraphs of Gibberish dataset generates random collections of sentences and the Date Variance dataset can vary existing dates randomly between user-defined boundaries. The Data Masker software comes equipped with datasets for just about every purpose.

The substitution is a bit tricky. I need to substitute different values depending on other column values.

This is still a Substitution rule with a dataset. Just apply a Where Clause to the Substitution rule to restrict the masking operations to a subset of the table rows. It is also possible to Sample the table contents. Often it is necessary to use multiple rules to achieve a masking operation. For example, you might mask the FIRST_NAME column in all rows with the Male First Names dataset and then add a second rule to mask the FIRST_NAME column in the rows WHERE GENDER='F' with the Female First Names dataset. Note that in cases such as the previous example, it is better to use one rule which does all rows and then add a second which picks up one of the special cases rather than two rules each with Where Clauses. See the section on Where Clause skips in the Data Masking: What You Need to Know Before You Begin white paper for a comprehensive discussion of this issue.

The substitution information is pretty specific to our organization. There is no dataset for it.

No problem, you can define your own if you need to do so. User-Defined Datasets are simple text files with one item per line. They are used in Substitution Rules just like the ones supplied with the Data Masker software.

The substitution data is complex, there is no suitable dataset and User Defined Datasets are not approprate.

Think about why this might be - is it because the structure of the data is very complex? If so, investigate using a Row-Internal Synchronization rule to build the replacement data. You can even use a Row-Internal Synchronization rule type to call a user-defined function or stored procedure. If a Row-Internal Synchronization rule is not the answer, consider using a custom-written Command Rule. Also investigate the use of the Text, Alpha-Numeric (Formatted) and Numbers, Integer (formatted) datasets.

I would like to keep the existing data - just jumble it up among the rows in the table.

This is a task for a Shuffle Rule. Shuffle rules randomly move the existing values in a column of data between the rows. This masks the data by removing the associations between items in the same row.

I have masked some data and I need to make some other columns change to the same values.

This is synchronization - and there are three types. The way to choose the most appropriate type is to consider the location of the source and target columns. If the information has to be synchronized in columns within the same row, then the requirement is Row-Internal Synchronization and a Row-Internal Synchronization rule is the option to choose. If the data to be synchronized consists of making groups of rows in the same table contain columns with identical values, then the Table-Internal Synchronization is required and a Table-Internal Synchronization rule is the one to use. The final type of synchronization is Table-To-Table - this occurs when the data in one table needs to match the newly masked data in another. This is probably the most common type of synchronization and the Table-To-Table Synchronization rule is designed for that purpose.

I have an empty table, I need to populate it with dummy data.

This is a job for an Insertion rule. An Insertion rule works just like a Substitution rule except that it creates rows rather than substituting existing values. Usually, Insertion rules are used to populate the NOT NULL columns of a table then substitution rules (with appropriate datasets) are used to fill in the other empty fields.

I need to perform an action such as truncating a table or creating a temporary index.

This is definitely a task for a Command rule. A Command rule can run any SQL command or T-SQL block. This includes such things as building or dropping tables or performing manual inserts and updates.

Some rules are running slowly because UPDATE triggers are firing.

Confirm that it is not essential for the triggers to fire and implement a Trigger Disable to disable the triggers and a subsequent Trigger Enable rule to re-enable them once the other masking rules have finished. Be sure to use rule blocks in order to make sure the Trigger Disable and Enable rules execute in the proper sequence.