Shuffle Rules
Published 06 January 2020
Shuffle rules are designed to randomly re-distribute the data in a table column. This effect is similar to the shuffling of a deck of cards (hence the name Shuffle rule). Each Shuffle rule affects only the specified columns in the target table. All other columns in the table remain unaffected.
Shuffle rules should not be used against tables containing small amounts of data. If there are too few rows, the shuffle process will still work, but the small number of rows makes it easier to manually "guess" which column data was originally associated with each row.
You cannot shuffle the primary key column(s) - if you wish to do this, select all other columns in the table which will effectively shuffle the primary key between them.
Example
Given a table that stores employee names, adding a shuffle rule to mask the LastName
column would result in something like:
ID | FirstName | LastName (BEFORE) | LastName (AFTER) |
---|---|---|---|
1 | Bill | Smith | Black |
2 | Sally | Jones | White |
3 | William | Green | Smith |
4 | Ben | White | Jones |
5 | Jane | Doe | Archer |
6 | Amanda | Smith | Green |
7 | Fred | Black | Smith |
8 | Jim | Archer | Doe |
The contents of the LastName
column have been re-arranged in a random order but the data in the other columns (ID and FirstName) has remained the same.
Core concepts
Shuffling multiple columns
Shuffle Rules can be configured to shuffle multiple columns at once. These values are shuffled together, to shuffle columns independently separate Shuffle Rules are required.
Shuffling does not destroy sensitive data
Shuffle Rules merely move data, they do not alter the contents of the data, so any data that is identifiable on its own (e.g. e-mail addresses) are not suitable for shuffling.
Adding a WHERE clause
Shuffle rules can use a Where Clause option to operate on a subset of the rows in a table. So to the above example, applying a where clause of:
WHERE ID > 4
Would result in something like:
ID | FirstName | LastName (BEFORE) | LastName (AFTER) |
---|---|---|---|
1 | Bill | Smith | Smith |
2 | Sally | Jones | Jones |
3 | William | Green | Green |
4 | Ben | White | White |
5 | Jane | Doe | Smith |
6 | Amanda | Smith | Doe |
7 | Fred | Black | Archer |
8 | Jim | Archer | Black |
Once again, be aware of the number of rows that are to be shuffled.
Advanced concepts
Where clauses referencing other tables
The custom where clause allows you to specify a JOIN
clause (the target table will be aliased as DM1
to help create the join condition).
In the example, assuming that you have another table, for EmployeeStatus
and you don't want to shuffle the details of current staff members.
EmployeeID | Status | AsOfDate |
---|---|---|
1 | Retired | 21/05/2013 |
2 | Resigned | 04/06/2014 |
3 | Current | 22/01/2012 |
4 | Current | 13/07/2013 |
5 | Resigned | 14/02/2014 |
6 | Current | 15/11/2019 |
7 | Resigned | 07/10/2016 |
8 | Retired | 18/12/2019 |
You could write:
INNER JOIN EmployeeStatus es ON es.EmployeeID = DM1.ID WHERE es.Status != 'Current'
Which would result in something like:
ID | FirstName | LastName (BEFORE) | LastName (AFTER) |
---|---|---|---|
1 | Bill | Smith | Jones |
2 | Sally | Jones | Smith |
3 | William | Green | Green |
4 | Ben | White | White |
5 | Jane | Doe | Archer |
6 | Amanda | Smith | Smith |
7 | Fred | Black | Doe |
8 | Jim | Archer | Black |