Redgate Test Data Manager

Subsetting command-line reference

This section provides a reference sheet for all the commands provided by the subsetter CLI tool.

Contents

Running the CLI

CLI binary

  1. Follow the subsetting installation guide.
  2. Call the subsetter CLI with either command line parameters and/or a YAML/JSON config file.

Docker image

  1. Follow the subsetting installation.
  2. Use docker to run the subsetting image: as you would run a CLI binary (i.e. with the same arguments). The docker image name varies depending on the version of the subsetter that you are using. Note that subsetter-0.1.0.0 below will need to be replaced with the subsetter filename from the version of the subsetter that you are using. For example version 0.2.0 has filename subsetter-0.2.0.0 and so on.
    • Command line parameters - If you want to use command line parameters to specify the configuration, then you call it as you would run a CLI binary (i.e. with the same arguments). Note that subsetter-0.1.0.0 below will need to be replaced with the subsetter filename from the version of the subsetter that you are using. For example version 0.2.0 has filename subsetter-0.2.0.0 and so on.

      # Running a subsetting operation against a locally hosted SQL server instance
      docker run -it --network=host subsetter-0.1.0.0
      --database-engine=sqlserver
      --source-connection-string="server=localhost;database=SomeSourceDatabase;trusted_connection=yes;TrustServerCertificate=yes"
      --target-connection-string="server=localhost;database=SomeTargetDatabase;trusted_connection=yes;TrustServerCertificate=yes"
      --starting-table="SomeSqlTable"
      --filter-clause="SomeFilter"
    • Configuration file -If you want to use a subsetting YAML/JSON configuration, you'll need to mount the configuration directory as in (Note: this is not needed if the config can be reached via network):

      # Use a host mounted configuration file with local host access:
      #   <PATH_TO_SUBSET_CONFIG_FOLDER> - A folder in your host machine
      #   <PATH_TO_CONFIG_FILE_RELATIVE_CONFIG_FOLDER> - A filename or subfolder within <PATH_TO_SUBSET_CONFIG_FOLDER>.
      docker run -it --network=host --mount type=bind,source="<PATH_TO_SUBSET_CONFIG_FOLDER>",target=/app/configs subsetter-0.1.0.0 --config-file "<PATH_TO_CONFIG_FILE_RELATIVE_CONFIG_FOLDER>"
      --source-connection-string="server=localhost;database=SomeSourceDatabase;trusted_connection=yes;TrustServerCertificate=yes"
      --target-connection-string="server=localhost;database=SomeTargetDatabase;trusted_connection=yes;TrustServerCertificate=yes"

Commands

Running a subset

You can provide the configuration file to use for subsetting either through the command line and/or a combination of a YAML/JSON config file and/or environment variables. If multiple sources are provided for the same parameter, values in the command line always take precedence over the ones stored in the configuration or environment.

Please check the installation requirements and known limitations listed in our troubleshooting guide.

Global flags and parameters

While every command has a set of dedicated key value parameters and flags that drive its behavior, subsetter also has a set of global flags and parameters that can be applied to all commands (or subset of commands).

Flag/ParameterDescriptionExample
--versionShow version informationsubsetter --version
-?, -h, --helpShow help and usage informationsubsetter --help
--console-log-level <Debug|Error|Fatal|Information|None|Verbose|Warning>

Minimum log level for the terminal's standard output and error. [default: Information]

When running on Docker it could be useful to increase the console log level so that debug messages go to the console.

subsetter --console-log-level=Warning
--file-log-level <Debug|Error|Fatal|Information|None|Verbose|Warning>Minimum log level for the subsetter's log file. [default: Debug]subsetter --file-log-level=Information
--log-folder <log-folder>Optional directory path to use when storing log files. If not specified, the default location varies per host Operating System (OS) running the tool.subsetter --log-folder=C:\Users\***\Documents

Via the command line parameters

The following parameters configure the subsetting:

Flag/ParameterDescriptionRequired
--config-file <config-file>The path and filename to the config file: a YAML or JSON file containing parameters that control the subsetting. If the path is omitted, the config file must be in the current working directory.  The config-file parameter must be provided if any required parameters are not provided as options to this command.

NO

--database-engine <MySql|Oracle|PostgreSql|SqlServer>The type of the database. The subsetter currently supports 4 database engines: MySql, Oracle, PostgreSql and SqlServer. Must be present if not in the config file (takes precedence).

YES

--source-connection-string <source-connection-string>

Connection string for the source database. Must be present if not in the environment variable REDGATE_SUBSET_SOURCE_CONNECTION_STRING (takes precedence).

See also the requirements for the source database on the installation requirements page.

YES

--target-connection-string <target-connection-string>

Connection string for the target database. Must be present if not in the environment variable REDGATE_SUBSET_TARGET_CONNECTION_STRING (takes precedence).

See also the requirements for the target database on the installation requirements page.

YES

--target-size <target-size>

The approximate size to try and make the target database, in the form x(B|KB|MB|GB|TB) e.g. 5GB. This cannot be used in conjunction with --filter-table and --filter-clause. Note that this is an approximate size and the actual size of the target database may be very different, especially for small databases.

NO

--starting-table <starting-table>

The database table to apply the filter clause to, for use in the subsetting operation (case-sensitive). Either this or the target size must be specified but not both. Can be specified here or in the config file.

NO

--filter-clause <filter-clause>

SQL query WHERE clause to be applied to the filter table. This is optional. Command line value takes precedence over config one. This option can only be used if --filter-table is provided. It can't be used in conjunction with --target-size.

Note that the filter clause should not contain the where keyword.

The filter supports complex queries as shown in the list of example filters.

NO

--target-database-write-mode <Overwrite|Strict>

Execution mode for the subsetter. Supported values are Strict and Overwrite.

  • In Strict mode the subsetter will return an error if tables that are part of the subset in the target database contain rows.
  • In Overwrite mode the subsetter will truncate the target tables that are part of the subset if they contain rows.

If omitted it defaults to Strict.

Overwrite mode should be used with care and only when running the tool manually, as it can lead to data loss if pointed at the wrong target database.

In either mode the subsetter only modifies tables that are part of the subset. This consideration can be important when using filter tables, as often these subsets don't include all tables in the database.

  • Therefore it is possible to run the subsetter against a target database where there is existing data in tables that are not part of the subset. These tables will not be affected.
  • It may be that the database relationship structure has disconnected groups of tables with no relationships at all between the groups. In this case it is possible to carry out multiple subsetter runs against the same target, with different filter tables so that each run affects entirely different tables. The the end result is that all the subsets are written to the same database. 

NO

--output-file <file-path>

Specify a file path to put the results of a subset in JSON format. Currently it outputs:

  • Size of the source database in bytes
  • Size of the target database in bytes
  • All tables that were part of the subset, with the number of rows in the source and how many rows were subset

The folder structure in the file path will be created if missing.

NO

For example:

# Using Windows authentication against local SQL Server database
.\subsetter
--database-engine=sqlserver
--source-connection-string="server=SomeSqlServer;database=SomeSourceDatabase;trusted_connection=yes;TrustServerCertificate=true"
--target-connection-string="server=SomeSqlServer;database=SomeTargetDatabase;trusted_connection=yes;TrustServerCertificate=true"
--starting-table="dbo.Users"
--filter-clause="FirstName LIKE 'Bob%'"

# Using Sql authentication against local SQL Server database
.\subsetter
--database-engine=sqlserver
--source-connection-string="server=SomeSqlServer;database=SomeSourceDatabase;Uid=sa;Pwd=123;TrustServerCertificate=true"
--target-connection-string="server=SomeSqlServer;database=SomeTargetDatabase;Uid=sa;Pwd=123;TrustServerCertificate=true"
--starting-table="dbo.Users"
--filter-clause="FirstName LIKE 'Bob%'"

 # Subsetting two PostgreSQL local databases using SQL authentication
.\subsetter
--database-engine=postgresql
--source-connection-string="Server=127.0.0.1;Port=5432;Database=SomeSourceDatabase;User Id=SomeUser;Password=SomePassword;"
--target-connection-string="Server=127.0.0.1;Port=5432;Database=SomeTargetDatabase;User Id=SomeUser;Password=SomePassword;"
--starting-table="dbo.Users"
--filter-clause="Division = 'Engineering'"

This is not an exhaustive list of the command line possibilities. Please run subsetter --help to discover some of our optional configuration parameters.

Via a YAML/JSON configuration file

Having a YAML/JSON configuration file already setup, makes the call to the CLI much simpler (but you still need to provide the connection strings or use the corresponding environment variable):

# Using a JSON configuration file...
.\subsetter --config-file="<pathToConfigFile>/config.json" 
--source-connection-string="SomeConnectionString"
--target-connection-string="SomeConnectionString"  

# ... or a YAML one
.\subsetter --config-file="<pathToConfigFile>/config.yaml"
--source-connection-string="SomeConnectionString"
--target-connection-string="SomeConnectionString" 

You still need the same mandatory parameters, i.e. source and target database connection strings, as when running via the command line parameters.


Exit codes

The subsetter CLI will return the following exit codes when running commands.

Anything other than SUCCESS (0) will indicate a negative outcome.

Exit codeDesignationOutcome statusWhen
0Success

OK

Used when a subsetter command completed successfully.
1GenericFailure

FAILURE

Catch all for general errors that don't have special meaning.
2
UnhandledException

FAILURE

Used when no other error code is suitable. It's the default error code.
3FailedInitialization

FAILURE

Used when failing initialization of services and tools.
4CliInvokedIncorrectly

FAILURE

Used when provided with missing or mismatched command line parameters or configuration files.
5InvalidConfiguration

FAILURE

Used when the provided configuration does not meet the subsetting requirements.
6FailedTableExtraction

FAILURE

Used when the SQL tables to subset could not be retrieved from the source database.
7FailedTablePreprocessing

FAILURE

Used when the data contained in the SQL tables to subset (in the source database) could not be prepared for subsetting usage.
8FailedUpdatingTargetDatabase

FAILURE

Used when the subsetter failed to migrate the data from the source to the target database.

Please check our Subsetting troubleshooting and known limitations section for some known error use cases and workarounds.


CLI logging

In terms of observability, the subsetter CLI has two main types of logging:

Logging typeLocationDefault Minimum Log LevelPurpose
FileDefault location varies per OS:
  • WindowsC:\ProgramData\Red Gate\Logs\TDM\Subsetter
  • Linux/Mac OSX (central location)/var/log/Red Gate/Logs/TDM/Subsetter
  • Linux/Mac OSX (local location)./Logs

The default log folder can be overridden in the command line by using the CLI parameter --log-folder to point to a locally accessible location.

DEBUG

Log client-side application-level information about subsetting execution and outcome. This will include, by default, additional contextual debug execution information.

Minimum log level when running commands can be changed with the global --file-log-level parameter (see global flags and parameters).
Console (standard out)stdout

INFORMATION

Displays non-error client-side application-level information about subsetting execution and outcome. This excludes Error and Fatal log events which will be added to the standard error stream instead.

Minimum log level when running commands can be changed with the global --console-log-level parameter (see global flags and parameters).
Console (standard error)stderror

ERROR (cannot be changed)

Displays all Error and above client-side application-level messages for issues during execution of the subsetter. This will lead to a non-successful exit code.

Log format

The filename of the log file will use following date sortable format: subsetter-yyyy_MM_dd-HH_mm_ss.json (e.g. Subsetter-2023_05_03-12_39_55.json).

The files are written in Compact Log Event Format, a JSON-based format. This means each log message has structured data associated with it. The open source Compact Log Viewer tool is a convenient way to view the logs.

(there are other tools for working with Compact Log Event Format files: see the list on the CLEF webpage).

There is no automatic purging of old log files built-in yet, so you may want to keep an eye on log files and delete older files from time to time.



Didn't find what you were looking for?