Rule Blocks and Dependencies
Published 19 March 2018
All actions within the Data Masker software are performed using a Masking rule. Masking rules implement such actions as the substitution of data from pre-prepared datasets, the shuffling of a column of data, or the sychronising data between two tables.
Important Note: The Data Masker software is multi-threaded. It can, and will, run multiple rules simultaneously. The number of rules which can run in parallel is determined by the Number of Rule Workers setting on the Run Statistics tab. Unless strictly specified (see below) each masking rule is considered by the Data Masker software to be an independent entity and its order of execution is not guaranteed.
Often, it is necessary to ensure that a certain rule completes before a subsequent rule executes. This is especially true if multiple masking rules will be operating on the same table rows and columns. Here's a typical scenario:
A table of customer information needs to be rendered anonymous and it has been determined that one of the columns which requires masking is the FIRST_NAME field. As an additional requirement, it is necessary that the masked data use names (male or female) appropriate to the gender of the record. The gender information M or F for each record is available in a GENDER field and a Where Clause option on a Substitution rule can easily handle the substitution of the first names from the two standard datasets: Names, First Names, Female and Names, First Names, Male.
So, for this case, it is decided that a Substitution rule to mask all first names (irregardless of gender) using the male first names dataset will be implemented and a subsequent Substitution rule with a Where Clause of WHERE GENDER='F' will then make sure the female names have appropriate values. This mask everything and go back and reconcile the where clauses method avoids the classic Where Clause Skip Error it which it is assumed that the GENDER column will always contain only 'M' or 'F' records.
Clearly, in the above example, it is not possible for the rule which will mask all of the rows in the table to run at the same time (or after) as the rule which masks only the rows WHERE GENDER='F'. If the rule which masked all rows with male first names ran after the rule which substituted female first names then any changes made by the female first names rule would be obliterated and the FIRST_NAME field would have only male first names. In this case, the order of execution really matters and it must be explicitly defined within the Data Masker software.
There are two ways to control the execution sequence of rules and make sure that they execute in a required order: Rule Dependencies and Rule Blocks. The following sections will consider each mechanism.
If a rule must execute after another rule completes, it is possible to build a dependency chain. Dependent rules cannot execute until their parent rule has completely finished execution. The dependency state of a rule is displayed on the Rules in Set tab in an indented form as shown below:
A View of Dependent Data Masker Rules
In the above illustration, rule 0005 is dependent on rule 0004 and will not execute until all substitution operations of rule 0004 complete. It is possible to build chains of any depth and complexity. To make a rule dependent on another rule simply use the mouse to drag and drop the dependent rule onto the parent rule. The screen will redraw to show the appropriate indentation once the dependency has been configured. To remove a rule dependency, drag and drop the dependent rule onto the Rule Controller.
Dependency chains are useful but there are dependency relationships under which their utility becomes a bit awkward. Imagine a scenario in which three rules (A, B, C) must execute to completion before a fourth rule (rule D) can begin. Implementing this scenario with dependency chains configures a sequence like:
in which each rule is dependent on the rule above it in the chain. This makes the chain very long and also falsely indicates that rules B and C are in some way dependent on rule A. This is not the case - rules A, B and C are independent. It is rule D that is dependent on all three.
To simplify things, increase parallelism and avoid implying false dependencies, the Data Masker software implements a concept called Rule Blocks. A rule block is a two-digit numeric prefix listed before the rule number. Rule blocks are processed in strict numeric order and all rules in a given rule block will complete before any rule in the next highest rule block begins. Inside a rule block the rules execute in random order as determined by the optimization routines and the availability of worker threads.
The above scenario (with rules A, B, C and D) would be handled by placing rules A, B and C in a numerically lower rule block than rule D. No complex dependency chain is necessary. The image below illustrates a sequence of rules assigned rule blocks 5, 10, 20 and 21 which are designed to synchronize values in other tables. All rules in rule block 01 will complete before the next highest rule block (rule block 5) will begin to execute. Similarly all rules in rule block 10 will complete before the rules in rule block 20 begin.
A View of Data Masker Rule Blocks
Note that rule 10-0007 has a dependent rule (rule xx-0008). Irregardless of the rule block they may have had previously, dependent rules always assume the same rule block as the parent rule and will execute immediately after their parent rule executes. Rule blocks for dependent rules are always listed as xx.
To change a rule block value (or rule number) click on the rule block field with the mouse and edit the rule block value as appropriate. If it is required to adjust the rule blocks of multiple rules, use the Bulk Change Rule Block tool activated by clicking on the Bulk Change Rule Block button at the bottom of the Rules in Set tab.
When is it appropriate to control the rule execution with dependencies instead of rule blocks?
This decision is something of an arbitrary choice. Typically dependencies are used to illustrate that the dependent rules are really just different aspects of the same masking operation. For example, rules 0004 and 0005 are both part of the same masking operation on the customer table. Using a dependency relationship here instead of a rule block clearly illustrates to the viewer how the two operations are related. Often different rule blocks are used as a notation convention to mark operations on separate tables. In the above example operations on the DM_EMPLOYEE table have been placed in the 20's range of rule blocks. Since they are independent they could just as easily have been placed in the same rule blocks as the rules operating on the DM_CUSTOMER table - but the implementer of the masking set decided to build it this way for enhanced readability.