Redgate Test Data Manager

Subsetting

Subsetting is a capability to transform large databases into smaller ones (aka subset), part of Redgate Test Data Manager solution.

It's available in the GUI and via a x-database and x-OS Command Line Interface (CLI) client called subsetter.

We would love to hear feedback from you, so please get in touch with your thoughts and suggestions.

Contents


What is subsetting?

Subsetting extracts a portion of referentially intact data from one database (source) into another (target).

One practical business use-case for subsetting is extracting data from a large production database to a smaller non-production one in a test environment for the use of developers.

Please also check our subset glossary entry for examples on when this is useful.

About our tool

The subsetter takes the connection strings of the source and target databases to compute the relational hierarchy of the source database and migrate the necessary subset of data on the target maintaining referential integrity.

The subset can be specified by providing a target size: the desired size in terms of disk space. The subsetter will select data from the source database in order to make a target database of approximately the requested size using statistical methods. 

Alternatively the subset can be specified by providing a starting table. The subset will follow the referential tree from this database table and collecting all the data necessary from the related tables. This can be further fine-tuned by also providing a SQL filter clause, to control which specific rows are to be brought in from the starting table. 

The starting table and clause allow fine control over what data is included in the subset, but require knowledge of the database structure. By contrast the target size feature allows easy creation of a subset with control over the impact on infrastructure resources, but with less control over the exact makeup of the subset.

One can subset multiple times and use a combination of the two modes to refine their subset, e.g. use starting table/filter clause first to bring in the required data, then use target size to bring the subset size down to a more operable value.

The current version of the subsetter supports 4 database engines: MySql, Oracle, PostgreSql and SqlServer.

There are some known limitations listed in our troubleshooting guide as the tool is unable to migrate certain data types like HierarchyId and Spatial. Columns with such types are identified and skipped. If they're also primary keys the entire table will be skipped. The subsetter is also not yet able to append rows to existing target tables without clearing them first.

Where to next?

  1. Getting started:
  2. Running a subset:
  3. If something goes wrong:

Didn't find what you were looking for?