Suggested Rules for the Northwind Sample Database
Published 19 March 2018
The Data Masker software contains a set of masking rules for the sample Northwind database supplied with SQL Server. You are encouraged to add your own masking rules to this masking set. The first section of this page discusses the existing rules (as supplied with the installation) and the second part provides some hints of additional rules which might be added.
Masking Rules in the Sample Northwind Masking Set
The Existing Rules
If you wish to explore each rule further, load the masking set into the Data Masker software and double click on the rule with the mouse to launch it in editing mode. You can then view the configuration of the rule in detail.
This is a Rule Controller. Notice how the other rules are dependent on it. A Rule Controller defines the database on which the other masking rules will act. If you change the login information in the Rule Controller its rules will operate on the new database. It is up to you to ensure that the Rule Controller points at an appropriate database. All other types of masking rule must have a parent Rule Controller and every masking set must contain at least one Rule Controller.
A Substitution Rule on the EMPLOYEES table. This rule uses the Random Surnames data set to provide the substitution data which masks the LASTNAME column.
A Substitution Rule on the TITLEOFCOURTESY column in the EMPLOYEES table. This rule uses a Where Clause and replaces all of the values which are not Mr., Ms. or Mrs. with the value Mr.. Why is this done? Well there could be titles in there like Dr. or Major and such titles could act as an indicator of the original row contents. In a small organization a title such as Dr. may be unique. This is an example of the Isolated Case Phenomena and your analysis of how a database should be masked must take this factor into consideration. The Data Masking: What You Need to Know Before You Begin white paper discusses this and many other subtle issues in considerable detail. We highly recommend that you read this paper.
Rules 10-0004, xx-0005
A group of Substitution Rules on the FIRSTNAME column in the EMPLOYEES table. These rules form a pair: Rule 0004 masks the FIRST_NAME field in all rows in the table using the Male First Names data set. Then rule 0005 goes back over the same data and replaces the female first names with values from the Female First Names data set. A Where Clause on rule 0005 restricts the update values to rows which are female. You should have a close look at rule 0005 since the Where Clause used is not obvious.
Rules 30-0006, 31-0007
These two rules modify the BIRTHDATE and HIREDATE. Items such as birthdays are a classic Isolated Case Phenomena - they are commonly known, and in a small collection of data they can be a unique identifier. The use of a Date Variance data set modifies the existing row value by some bounded random value. In the case of rule 0006 this is anywhere between plus or minus 100 days. Variance techniques will largely preserve the distribution of the data while effectively masking the actual values. Be sure to note the variance on rule 0007, it has a much lower upper bound (10 days) so as to minimize the possibility of having a hire date in the future.
Note: It is important to note that the Data Masker software is multi-threaded. It can, and will, execute multiple rules simultaneously. Since they are both operating on the same table and column, rule 0005 must only execute after rule 0004 completes - otherwise the actions of rule 0004 could overwrite the actions of rule 0005. The dependency relationship used in rule groups 0004-0005 forces rule 0005 to execute after rule 0004 completes. It is also possible to use rule blocks for this purpose. For example, rules 0005 could be placed in a higher rule block than rule 0004 and the same result would have been achieved. The order in which the rules are visible on the screen is not necessarily the order in which they will execute. The execution order always needs to be explicitly specified using Dependencies and Rule Blocks. A dependency relationship was chosen in this case to emphasize that the rules are really just two aspects of the same masking operation.
Below are some suggestions for additional rules which can be added to the Pubs database sample masking set. An ER diagram for Northwind is available to assist you in your analysis of the database structure. If you do add new masking rules it is a good idea to save the modified masking set under a new name - that way if you upgrade the Data Masker software you will not overwrite your enhancements.
Have a look at the contents of the TITLE column in the EMPLOYEES table. This is a very distinct identifier of the original contents of the row. Read about User Defined Datasets and make some of your own job titles and use them in a Substitution Rule on the TITLE column.
Examine the remaining columns in the EMPLOYEE table and make decisions about which ones contain sensitive information and implement suitable Substitution Rules for them.
Apply a Number Variance rule to the UNITPRICE column in the ORDERDETAILS table. Be sure to set it up so the UNITPRICE values cannot become negative.
Both the CUSTOMERS and SUPPLIERS tables have sensitive columns. Analyze these tables and apply suitable masking rules.