SQL Data Generator 2

Cleansing or removing sensitive data from an existing data source

There may be occasions when it would be useful to use existing data, but the data source contains sensitive data such as dates of birth, personal email addresses or credit card numbers. For example, you may want to send a database to a third party application vendor for troubleshooting. SQL Data Generator can help in this situation by enabling you to use most of the existing data but replacing the personal information.

This page explains how you can set up a SQL Data Generator project to use existing data and then replace the sensitive data with randomly-generated but realistic data.

Preparing the data

The data must exist in a SQL Server database. You must create a copy of the database because you will be removing the sensitive data permanently.

To create the copy, either:

  • restore a backup of the database to a new location, or
  • use SQL Packager to make a .NET executable of the existing database and then run the executable to create a new database

Setting up the SQL Data Generator project

When you have created a copy of the the database, create a new SQL Data Generator project using the copy. You must use the copy because you will be using the original database as the data source but permanently replacing the columns containing sensitive data in the copy.

For further information about how to create a SQL Data Generator Project, see: Setting up the data generator.

To create a new project:

  1. Click New Project.
  2. In the Project Configuration dialog box, select the SQL Server and name of the database you just created and click OK.
  3.  In the Tables to populate pane, click Deselect all and then select only the tables containing the columns with sensitive data that you want to replace.

In the Table generation settings pane, for each table that contains data to be cleansed:

  1. Under Source of data, select Use existing data source.
  2. Next to Source type, make sure SQL Table or View is displayed.
  3. Next to Source, click Browse.
  4. In the Select SQL Table or View dialog box, select the original SQL Server and database and click Next.
  5. Select the table containing the columns that you want to replace and click Finish.
  6. Make sure the Delete data from table before generation check box is selected.

You have now mapped the existing database tables to the tables that contain the data to be replaced so that any columns within that table that you do not want to change retain the original data. Now you need to select the columns containing sensitive data within these tables and choose the generators to generate random data.

To select the column and associated generator:

  1. In the Tables to populate pane, expand the table tree.
  2. Select the column you wish to replace.
  3. In the Column generation settings pane, select an appropriate generator from the drop-down list.
  4. The preview pane is updated dynamically showing that the original data will be replaced with randomly-generated data.

Repeat the above steps for each column with sensitive data.

Generating the data

When you are happy with the preview, click  Generate Data above the Table generation settings pane.

The Data Population Summary dialog box displays an action plan for the data generation.

Click Generate Data. Data is generated according to the options specified in your project, using the existing data for columns you did not want to change, and randomly-generated data for those you did.

A data generation report in PDF format is then displayed, confirming the number of rows inserted into each table. You can export and save this file if required. Close the report to return to SQL Data Generator.

If you are likely to perform the same data generation task, save and name the project: click  Save Project.

Packaging the cleansed database

Now that the data is ready for distribution, you can decide how to despatch it. You can either create a backup file or use SQL Packager to create a .NET executable.

You can then send the data to the recipient with no personal information at risk of compromise.


Didn't find what you were looking for?