Suggested Rules for the PUBS Sample Database
Published 06 January 2020
The Data Masker software contains a set of masking rules for the sample PUBS database supplied with SQL Server. You are encouraged to add your own masking rules to this masking set. The first section of this page discusses the existing rules (as supplied with the installation) and the second part provides some hints of additional rules which might be added.
Masking Rules in the Sample PUBS Masking Set
The Existing Rules
If you wish to explore each rule further load the masking set into the Data Masker software and double click on the rule with the mouse to launch it in editing mode. You can then explore the rules configuration in detail.
Rule 01-0001
This is a Rule Controller. Notice how the other rules are dependent on it. A Rule Controller defines the database on which the dependent rules will act. If you change the login information in the Rule Controller its rules will operate on the new database. It is up to you to ensure that the Rule Controller points at an appropriate database. All other types of masking rule must have a parent Rule Controller and every masking set must contain at least one Rule Controller.
Rule 01-0002
A Substitution Rule on the STORES table. This rule uses the Random Company Names data set to provide the substitution data.
Rules 02-0003, 03-0004, 04-0005, 05-0006
More Substitution Rules on fields in the STORES table. These rules use data sets appropriate to the column content. Particularly note that the state column is substituted using uppercase two letter abbreviations of the state name and the zip column receives the 5 digit zipcode only. Other options are possible - try editing rule 0006 to implement a Zip+4 substitution value.
Rule 11-0007
A Substitution Rule on the TITLES table on the price column. This rule assumes that although the prices need to be masked, it is not acceptable to use random numbers as this would distort the data inappropriately. The Number Variance data set is used to modify each existing value within a range of plus or minus 10%. Note that this column is of a number datatype - so the available data sets are different than for text fields. It might be a good idea to double click on the rule to have a look at the available data sets.
Rule 12-0008
A Substitution Rule on the TITLES table on the pubdate column. This rule assumes that like the price column, it is not acceptable to just use random dates. The Date Variance data set is used to modify the date within the bounds of an acceptable variation. It would be a good idea to double click on this rule to look at the data set options.
Rule 13-0009
This rule is a bit more complicated. The assumption here is that the title column requires contents containing two random words. This is done by substituting the title column with a concatenation of two random words from the Random Dictionary Words dataset using a Row-Internal Synch. Rule. You should open up this rule (by double clicking on it with the mouse) to understand how it is constructed.
Rules 50-0010, xx-0011
The notes column is substituted using the Null Values data set (rule 0010) and then a random 33% of the rows are then substituted with random paragraphs of gibberish text using rule 0011.
Note: It is important to remember that the Data Masker software is multi-threaded. It can, and will, execute multiple rules simultaneously. Since they are both operating on the same table and column, rule 0011 must only execute after rule 0010 completes - otherwise the actions of rule 0010 could overwrite the actions of rule 0011. The dependency relationship used in rule groups 0009-0011 forces rule 0011 to execute after rule 0009 completes. It is also possible to use rule blocks for this purpose. For example, rules 0011 could be placed in a higher rule block than rule 0010 and the same result would have been achieved. The order in which the rules are visible on the screen is not necessarily the order in which they will execute. The execution order always needs to be explicitly specified using Dependencies and Rule Blocks. A dependency relationship was chosen in this case to emphasize that the rules are really just two aspects of the same masking operation.
Rules 90-0012
This is a Command Rule which truncates all rows in the ROYSCHED table. It is assumed, for the purposes of the masked database, that the ROYSCHED data is not necessary and too sensitive to leave in place. Simply removing the data is sometimes an option in cases such as that.
Additional Rules
Below are some suggestions for additional rules which can be added to the PUBS database sample masking set. An ER diagram for PUBS is available to assist you in your analysis of the database structure. If you do add new masking rules it is a good idea to save the modified masking set under a new name - that way if you upgrade the Data Masker software you will not overwrite your enhancements.
TITLES:advance
Use a substitution rule and the User Specified Numbers data set to set each value in the advance column to a fixed sum of 1024.00. Place a Where Clause on the rule so that values that are already NULL are not affected.
AUTHORS:...
Examine the columns in the Authors table and make decisions about which ones contain sensitive information and implement suitable Substitution Rules for them.
AUTHORS:emp_id
Use the Text, Alpha-Numeric data set to generate new values for the emp_id column. Note there are two formats in this column AAANNNNNA and A-ANNNNNA. Use the first for all rows or use two rules with Where Clauses to mask each in its original format.
PUB_INFO:logo
Use the Null Values data set to remove all values from the logo field.
DISCOUNTS:...
Add a Command Rule to remove all values from the DISCOUNTS table where the stor_id is not null. Hint: just implement a statement like delete from DISCOUNTS where stor_id is not null in the Command Rule.