Redgate Test Data Manager

Subsetting configuration file

This page explains the various configuration file formats supported by the subsetter CLI and how you can construct them.

Contents

Subsetting configuration file formats

We support both JSON and YAML file formats as an alternative way to configure the subsetter, in addition to command line parameters and environment variables.

The tool is agnostic (e.g. the fields are exactly the same) in regards to which format is being used, but in terms of intent YAML is more geared towards human consumption (e.g. supports comments and other user friendly features) whereas JSON is mainly targeted for automated processing (e.g. in CI/CD pipelines).

Subsetting configuration file fields

The following configuration fields can be stored in our configuration files to control how the CLI does the subsetting from the source database to the target database. Some of the fields have equivalent CLI parameters.

  • DatabaseEngine - The database engine used by the source and target database. Currently we support four databases: MySQL, Oracle, PostgreSQL and SQL Server
    • Corresponds to --database-engine
  • TargetSize - The approximate size to try and make the target database, in the form x(B|KB|MB|GB|TB) e.g. 5GB. This cannot be used in conjunction with StartingTables. Note that this is an approximate size and the actual size of the target database may be very different, especially for small databases.
    • Corresponds to --target-size
  • StartingTables - It can be defined as an array of tables along with their associated filter clauses. When there are more than one starting table the subsetter brings in all data that matches any of the filter clauses and also all data that links to that.
    • Table - The initial database table for the subsetting operation. Must exist with content in the source database, i.e. must contain at least one row. 
    • FilterClause - SQL Filter to be applied to the filter table. This is not required. Example usages of filter clause can be found here.
  • ManualRelationships - Manually specify relationships between database tables. This is not required. If it is present, this field should contain a list of manual relationships. Use this to specify foreign key relationships that are not actually in the database schema, but that should be used during subsetting. The subsetter will maintain referential integrity for these relationships. An example file is shown below.
    • There is no corresponding CLI parameter: manual relationships can only be specified in the configuration file.
    • Manual relationships can only be used from subsetter version 1.0.0.0
    • Each manual relationship in the list must have the following fields:
      • ParentTable - The parent table for the relationship, i.e. the table with the primary key. This table must exist in the source database. 
      • ParentColumns - A list of one or more columns in the parent table. These are the primary key columns. The columns must exist in the parent table. 
      • ChildTable - The child table for the relationship, i.e. the table with the foreign key. This table must exist in the source database.  
      • ChildColumns - A list of one or more columns in the child table. These are the foreign key columns. The columns must exist in the child table in the source database. If there are multiple columns, there must be the same number of ChildColumns as ParentColumns, and the ChildColumns must be in the same order as the ParentColumns.  

Notes on the file format

Please note that other than the above, nothing else can be stored in the configuration file, or the subsetter will fail to parse it.

In particular, the source and target database connection strings cannot be stored in the configuration file for security and privacy reasons. Config files are often added to source control and we wouldn't want customers to accidently submit a user/password in their connection string. As a result, these have to be passed in via CLI parameters or environment variables.

It is recommended to put all field values in the config file in double quotes (e.g. FilterTable: "SomeTable" instead of FilterTable: SomeTable), otherwise special characters such as brackets [] may cause the subsetter to fail to parse the config file and crash.

Once you have the configuration files setup, you can pass them to the subsetter CLI via the --config-file parameter. Please check the CLI reference for more details, or see the example command lines below.

Examples

Here are a few examples. 

SQL Server YAML configuration file example

Here are steps to carry out a subset using fields in a configuration file to provide some of the information, rather than CLI parameters. 

First we will use a YAML file. Create a new file named config.yaml preferably under the same directory as the subsetter executable, and fill its content as follows:

DatabaseEngine: "SqlServer"
StartingTables:
- Table: "[dbo].[Users]"
  FilterClause: "[OrgId] = 1"

Note :  Multiple `Table` and multiple `FilterClause` can be specified

With the configuration file ready, you can run command lines as follows. Note that the config-file parameter specifies the path and filename to the configuration file. If the path is omitted, the config file must be in the current working directory. 

./subsetter.exe --database-engine sqlserver `
--source-connection-string "server=localhost;database=SourceDatabase;trusted_connection=yes;TrustServerCertificate=yes" `
--target-connection-string "server=localhost;database=TargetDatabase;trusted_connection=yes;TrustServerCertificate=yes" `
--config-file config.yaml `
--target-database-write-mode Overwrite
subsetter.exe --database-engine sqlserver ^
--source-connection-string "server=localhost;database=SourceDatabase;trusted_connection=yes;TrustServerCertificate=yes" ^
--target-connection-string "server=localhost;database=TargetDatabase;trusted_connection=yes;TrustServerCertificate=yes" ^
--config-file config.yaml ^
--target-database-write-mode Overwrite


SQL Server JSON configuration file example

The subsetter also supports JSON format for the configuration file. To try this out, you can follow the same steps as for the YAML configuration file example above, with these exceptions. 

Instead of making config.yaml, make a file config.json preferably under the same directory as the subsetter executable, and fill its content as follows:

{
   "DatabaseEngine": "SqlServer",
	"StartingTables" : 
	[
		{
		   "Table": "[dbo].[Users]",
		   "FilterClause": "[OrgId] = 1" 
		}
	]

}

The command line to run the subsetter is exactly the same as for a YAML configuration file (given above), although you must provide the filename for the JSON file instead of the YAML file in the config-file parameter.

Sql Server YAML configuration file example, with target size

Here is an example YAML file where the target size has been specified instead of the filter table and filter clause. 

DatabaseEngine: "SqlServer"
TargetSize: "5GB"

Manual relationships configuration file example 

Here is an example JSON file showing ManualRelationships. The file contains one manual relationship, with one single column. 

{
   "DatabaseEngine": "SqlServer",
   "StartingTables" : 
	[
		{
		   "Table": "[dbo].[Users]",
		   "FilterClause": "[OrgId] = 1" 
		}
	],
   "ManualRelationships": 
     [
       {
         "ParentTable": "[dbo].[Orgs]", 
         "ParentColumns": ["Id"], 
         "ChildTable": "[dbo].[Users]",
         "ChildColumns": ["OrgId"]
       }
     ]
}

Didn't find what you were looking for?