Masking sensitive data from an existing data source
Published 05 February 2013
There may be occasions when you want to use existing data, but the data source contains sensitive data such as dates of birth, personal email addresses or credit card numbers. For example, you might want to:
- mask sensitive data for realistic testing
- create demo databases without sharing live data
- send a masked database to a third party application vendor for troubleshooting
This page explains how you can set up a SQL Data Generator project to use existing data and then replace the sensitive data with randomly-generated but realistic data.
Preparing the data
The data must exist in a SQL Server database. You must create a copy of the database because you'll be removing the sensitive data permanently.
To create the copy, either:
- restore a backup of the database to a new location, or
- use SQL Packager to make a .NET executable of the existing database and then run the executable to create a new database
Setting up the SQL Data Generator project
When you have created a copy of the the database, create a new SQL Data Generator project using the copy. You must use the copy because you will be using the original database as the data source but permanently replacing the columns containing sensitive data in the copy.
For further information about how to create a SQL Data Generator project, see Setting up the data generator.
To create a new project:
- Click New Project.
- In the Project Configuration dialog box, select the SQL Server and name of the database you just created and click OK.
- In the Tables to populate pane, click Deselect all and then select only the tables containing the columns with sensitive data that you want to mask.
In the Table generation settings pane, for each table that contains data to be masked:
- Under Source of data, select Use existing data source.
- Next to Source type, make sure SQL Table or View is displayed.
- Next to Source, click Browse.
- In the Select SQL Table or View dialog box, select the original SQL Server and database and click Next.
Select the table containing the columns that you want to mask.
If the table you want to mask references other objects in the database, you also need to select those referenced objects.
For example, if there is a foreign key relationship between the Widgets and WidgetDescriptions tables, select both tables.
- Click Finish.
- Make sure the Delete data from table before generation check box is selected.
You have now mapped the existing database tables to the tables that contain the data to be replaced so that any columns within that table that you do not want to change retain the original data. Now you need to select the columns containing sensitive data within these tables and choose the generators to generate random data.
To select the column and associated generator:
- In the Tables to populate pane, expand the table tree.
- Select the column you want to mask.
- In the Column generation settings pane, select an appropriate generator from the drop-down list.
- The preview pane is updated dynamically showing that the original data will be replaced with randomly-generated data.
Repeat the above steps for each column with sensitive data.
Generating the data
When you are happy with the preview, clickGenerate Data above the Table generation settings pane.
The Data Population Summary dialog box displays an action plan for the data generation.
Click Generate Data. Data is generated according to the options specified in your project, using the existing data for columns you did not want to change, and randomly-generated data for those you did.
A data generation report in PDF format is then displayed, confirming the number of rows inserted into each table. You can export and save this file if required. Close the report to return to SQL Data Generator.
If you are likely to perform the same data generation task, save and name the project: clickSave Project.
Packaging the masked database
Now that the data is ready for distribution, you can decide how to despatch it. You can either create a backup file or use SQL Packager to create a .NET executable.
You can then send the data to the recipient with no personal information at risk of compromise.