Data Masker

Data Masker for SQL Server Performance Baselines

To get a sense of Data Masker's performance characteristics and typical configuration adjustments available in the product, we have run a number of tests that highlight how different machine configurations affect the execution time of common rule configurations.

We set up two environments in a Virtual Machine in Azure (Iaas), installed Windows Server 2016 Datacenter Edition, SQL Server 2017 Developer Edition and restored the the Stack Overflow 2010 10GB database (set to simple recovery mode with no extra indexes applied).

We provisioned environment 1 with 2 vCPUs and 16 GiB RAM and environment 2 with 4 vCPUs and 32 GiB RAM.

What follows are the performance tests we conducted:



Basic Rules

For this test, we set up 2 substitution rules:

  • For the dbo.Users table,
    • masking the DisplayName column with the dataset Names,First+Last,Male+Female and 
    • masking the Location column with the dataset State Names (US)
  • For the table dbo.Posts,
    • masking the LastActivityDate and LastEditDate columns with the dataset Date Variance (Correlated) with Low Bound of -53, High Bound of 65, not allowing 0 as a valid variance

Running with 2 Workers enabled, processing 4,028,593 rows

  • Environment 1 completed in 3 minutes and 59 seconds.
  • Environment 2 completed in 3 minutes and 51 seconds.



Calculated Split Ranges

For this test, we set up 2 substitution rules:

  • For the dbo.Users table, split into 2 Ranges on ID,
    • masking the DisplayName column with the dataset Names,First+Last,Male+Female
    • masking the Location column with the dataset State Names (US)
  • For the dbo.Posts table, split into 6 Ranges on ID
    • masking the LastActivityDate and LastEditDate columns with the dataset Date Variance (Correlated) with a Low Bound of -53, a High Bound of 65, not allowing 0 as a valid variance

Running with 8 Workers enabled, processing 4,028,593 rows

  • Environment 1 completed in 3 minutes and 22 seconds.
  • Environment 2 completed in 2 minute and 49 seconds.



Basic Rules with Defined Split Ranges

For this test, we set up 2 substitution rules, carefully selecting split ranges to ensure the work was evenly distributed:

  • For the dbo.Users table, split into 2 Ranges,
    • masking the DisplayName column with the Names,First+Last,Male+Female dataset
    • masking the Location column with the State Names (US) dataset
  • For the dbo.Posts table, split into 6 Ranges,
    • masking the LastActivityDate and LastEditDate columns with the Date Variance (Correlated) dataset with a Low Bound of -53, High Bound of 65, not allowing 0 as a valid variance

Running with 8 Workers enabled, processing 4,028,593 rows

  • Environment 1 completed in 3 minutes and 5 seconds.
  • Environment 2 completed in 2 minutes and 10 seconds.



Advanced Masking Rules

For this test, we set up

  • one Shuffle Rule targeting the CreationDate column on the dbo.Comments table
  • one Search-Replace Rule targeting the Body column of the dbo.Posts table

Running with 2 Workers enabled, processing 7,604,368 rows

  • Environment 1 completed in 32 minutes and 39 seconds
  • Environment 2 completed in 34 minutes and 38 seconds



Synchronization Rules

For this test, we set up

  • Table-To-Table Rule synchronizing masked DisplayName values from dbo.Users to dbo.Posts based on User.ID = Posts.LastEditorDisplayName.
  • Row Internal Rule masking the WebsiteUrl column on the dbo.Users table to be: https:// + the first 3 characters of the DisplayName column value + the DownVotes column value (cast as nvarchar) + '@' + DMSPARAM1 + '.example.com' where DMSPARAM1 is set to a value from the Text, Dictionary Words dataset.

Running with 2 Workers enabled, processing 1,233,688 rows and running the rules in different rule blocks as we are operating on the same dbo.Users table,

  • Environment 1 completed in 3 minutes and 18 seconds
  • Environment 2 completed in 1 minute and 16 seconds


Synchronization Rules with Indexed Join Target

For this test, we created a non-clustered index on dbo.Posts.LastEditorUserID to speed up the identification of the rows that require synchronization.

  • A Table-to-Table Rule synchronizing masked DisplayName values from dbo.User to dbo.Posts based on the User.ID = Posts.LastEditorUserID

  • A Row Internal Rule masking the WebsiteUrl column on dbo.Users to be: https:// + the first 3 characters of the DisplayName column value + the DownVotes column value (cast as nvarchar) + '@' + DMSPARAM1 + '.example.com' where DMSPARAM1 is set to a value from the Text, Dictionary Words dataset.

Running with 2 Workers enabled, processing 1,233,688 rows and running the rules in different rule blocks as we are operating on the same dbo.Users table,

  • Environment 1 completed in 43 seconds
  • Environment 2 completed in 35 seconds


Sync Manager Rules

For this test, we created a Sync Manager rule, masking the DisplayName column in a temporary table with the DataSet Names,First+Last,Male+Female on the dbo.Users table and then fanning this out to the dbo.Users and dbo.Posts tables, assuming that there is no common join. No FK relationships were set up in the database. Instead, we used Rule Generation by Column Name.

Running with 2 Workers enabled, processing 1,468,016 rows

  • Environment 1 completed in 49 seconds.
  • Environment 2 completed in 39 seconds.



Didn't find what you were looking for?