Redgate Test Data Manager

Subsetting requirements

This pages explains all the (installation) requirements that need to be in place to successfully perform a subsetting operation.

Contents

Distribution types

The subsetter is provided in a CLI binary supporting multiple OS architectures.

General requirements

To run the subsetter, regardless of what distribution you pick, you need at least:

  • Databases:
    • Accessible connections to source and target database of one of our supported database engines.
      • The source connection string will identify the specific database to be subsetted. The following rules apply to schemas:

        • For Azure Sql, PostgreSql and Sql Server, the subset may include all tables in all schemas in that database 
        • For MySql, "schema" is effectively equivalent to "database": the subset will not include any tables from other databases/schemas
        • For Oracle, "schema" is effectively equivalent to "user": the source connection string must specify the user and the subset will not include tables from other users/schemas
    • The source database must:
      • Not be empty.
      • Allow the subsetter to create tables for temporary data as part of subsetting.
      • Not be modified during subsetting by any outside process. For example it must not be a live production database.
    • The target database must:
      • Be distinct from the source database.
      • If the target database is on a different server to the source database, it is recommended that both servers have the same machine architecture and database engine version, as otherwise the subset operation can produce unexpected results.
      • Have the necessary tables to receive subset data, with the same structure i.e. exactly the same columns.
        • See the notes below under subsetting configuration about which tables will receive subset data. It depends on the configuration.
      • Have no valuable data in the tables that will receive subset data.
        • In strict mode, the subsetter will fail if there are any data in the tables at all. In overwrite mode, the subsetter will delete all existing rows of data instead.
        • In older versions of the subsetter (version 0.3.0.0 and earlier), all the non-system tables should be empty. 
      • See the documentation for hints on how to make a valid target database. If you have access to Flyway, you can create a target database using its baselining feature to create a copy of the source database without any data.
  • Subsetting configuration:
    • The user needs to provide all the mandatory CLI parameters. The subsetter also requires EITHER a desired target size, OR a starting table and (optionally) a filter clause, but NOT BOTH.
      • As alternatives to the CLI, a config file and/or environmental variables can be used for configuration. Please see Subsetting configuration file for a more detailed explanation.
    • The target size option can be used to create a subset without needing to specify details of the internals of the source database. The subsetter attempts to create a subset at approximately the specified size with data from all relevant tables in the source database.
      • The source connection string identifies the specific source database to be subsetted. The following rules apply to schemas:
        • For Azure Sql, PostgreSql and Sql Server, the subset will include all tables in all schemas in that database 
        • For MySql, "schema" is effectively equivalent to "database", and the subset will include all tables in the database/schema
        • For Oracle, "schema" is effectively equivalent to "user": the source connection string must specify the user and the subset will include every table in that user/schema, and no tables from other users.
    • The starting table is a table in the source database to define where to start the subset from.
      • The starting table must have at least one row in the source database.
      • The subsetter will follow database relationships from the starting table to determine which tables are potentially part of the subset. 
        • The subset will not follow relationships to different databases.
        • For Azure Sql, PostgreSql and Sql Server, the subset will include tables in different schemas if there are relationships across schemas. 
        • For Oracle, the subset will not follow relationships to different users i.e. to different schemas.
      • The row values in the starting table determine which row values from other tables are included in the subset. The subset includes data rows linked by primary-foreign key relationships to key values in the filter table. The subsetter then follows the relationships from these tables to include data in other tables that are linked to their data, and then includes data from further tables linked to these tables, and so on. 
    • The filter clause is an optional SQL filter to narrow down the whole subset by narrowing down the rows of the starting table.
      • If not provided, all rows in the starting table will be included in the subset.  
      • If provided, at least one row in the source database must match the filter clause - otherwise the subset would be empty. 
      • If provided, the subset will only include data linked by primary-foreign key relationships to the rows in the starting table that match the filter. The subsetter then follows the relationships from these tables to include data in other tables that are linked to their data, and then further tables linked to their data, and so on. In this way the filter clause narrows down the rows of the entire subset.

CLI specific requirements

Hardware requirements

The speed of the subsetting operation will typically be greatly improved if there is a fast network connection from the subsetter CLI to the source and target databases.

The server hardware requirements to host a subset database will entirely depend on the size of the subset. A very small subset database will have minimal requirements, a very large database subset will have the same requirements as the original source database.

Supported OS architectures

The subsetter CLI can run in Windows, Linux (and MacOS in the future).

Binaries for the subsetter CLI executables are provided for the following OS architectures:

  • Windows (64-bit)
  • Linux Desktop (64-bit) (e.g. CentOS, Debian, Fedora, Ubuntu, and derivatives)
  • Linux MSUL (64-bit) (e.g. Alpine)

Please check our Installation guide for more details on how to get these up-and-running.


Didn't find what you were looking for?