Subsetting configuration file
Published 16 October 2023
This page explains the various configuration file formats supported by the subsetter CLI and how you can construct them.
Contents
Subsetting configuration file formats
We support both JSON and YAML file formats as an alternative way to configure the subsetter, in addition to command line parameters and environment variables.
The tool is agnostic (e.g. the fields are exactly the same) in regards to which format is being used, but in terms of intent YAML is more geared towards human consumption (e.g. supports comments and other user friendly features) whereas JSON is mainly targeted for automated processing (e.g. in CI/CD pipelines).
Subsetting configuration file fields
The following configuration fields can be stored in our configuration files to control how the CLI does the subsetting from the source database to the target database. Some of the fields have equivalent CLI parameters.
- DatabaseEngine - The database engine used by the source and target database. Currently we support four databases: MySQL, Oracle, PostgreSQL and SQL Server
- Corresponds to
--database-engine
- Corresponds to
- TargetSize - The approximate size to try and make the target database, in the form x(B|KB|MB|GB|TB) e.g. 5GB. This cannot be used in conjunction with
StartingTables
. Note that this is an approximate size and the actual size of the target database may be very different, especially for small databases.- Corresponds to
--target-size
- Corresponds to
- StartingTables - It can be defined as an array of tables along with their associated filter clauses. When there are more than one starting table the subsetter brings in all data that matches any of the filter clauses and also all data that links to that.
- Table - The initial database table for the subsetting operation. Must exist with content in the source database, i.e. must contain at least one row.
- FilterClause - SQL Filter to be applied to the filter table. This is not required. Example usages of filter clause can be found here.
- ManualRelationships - Manually specify relationships between database tables. This is not required. If it is present, this field should contain a list of manual relationships. Use this to specify foreign key relationships that are not actually in the database schema, but that should be used during subsetting. The subsetter will maintain referential integrity for these relationships. An example file is shown below.
- There is no corresponding CLI parameter: manual relationships can only be specified in the configuration file.
- Manual relationships can only be used from subsetter version 1.0.0.0.
- Each manual relationship in the list must have the following fields:
- ParentTable - The parent table for the relationship, i.e. the table with the primary key. This table must exist in the source database.
- ParentColumns - A list of one or more columns in the parent table. These are the primary key columns. The columns must exist in the parent table.
- ChildTable - The child table for the relationship, i.e. the table with the foreign key. This table must exist in the source database.
- ChildColumns - A list of one or more columns in the child table. These are the foreign key columns. The columns must exist in the child table in the source database. If there are multiple columns, there must be the same number of ChildColumns as ParentColumns, and the ChildColumns must be in the same order as the ParentColumns.
Notes on the file format
Please note that other than the above, nothing else can be stored in the configuration file, or the subsetter will fail to parse it.
In particular, the source and target database connection strings cannot be stored in the configuration file for security and privacy reasons. Config files are often added to source control and we wouldn't want customers to accidently submit a user/password in their connection string. As a result, these have to be passed in via CLI parameters or environment variables.
It is recommended to put all field values in the config file in double quotes (e.g. FilterTable: "SomeTable"
instead of FilterTable: SomeTable
), otherwise special characters such as brackets [] may cause the subsetter to fail to parse the config file and crash.
Once you have the configuration files setup, you can pass them to the subsetter
CLI via the --config-file
parameter. Please check the CLI reference for more details, or see the example command lines below.
Examples
Here are a few examples.
SQL Server YAML configuration file example
Here are steps to carry out a subset using fields in a configuration file to provide some of the information, rather than CLI parameters.
First we will use a YAML file. Create a new file named config.yaml
preferably under the same directory as the subsetter executable, and fill its content as follows:
DatabaseEngine: "SqlServer" StartingTables: - Table: "[dbo].[Users]" FilterClause: "[OrgId] = 1"
Note : Multiple `Table`
and multiple `FilterClause` can be specified
With the configuration file ready, you can run command lines as follows. Note that the config-file
parameter specifies the path and filename to the configuration file. If the path is omitted, the config file must be in the current working directory.
SQL Server JSON configuration file example
The subsetter also supports JSON format for the configuration file. To try this out, you can follow the same steps as for the YAML configuration file example above, with these exceptions.
Instead of making config.yaml
, make a file config.json
preferably under the same directory as the subsetter executable, and fill its content as follows:
{ "DatabaseEngine": "SqlServer", "StartingTables" : [ { "Table": "[dbo].[Users]", "FilterClause": "[OrgId] = 1" } ] }
The command line to run the subsetter is exactly the same as for a YAML configuration file (given above), although you must provide the filename for the JSON file instead of the YAML file in the config-file
parameter.
Sql Server YAML configuration file example, with target size
Here is an example YAML file where the target size has been specified instead of the filter table and filter clause.
DatabaseEngine: "SqlServer" TargetSize: "5GB"
Manual relationships configuration file example
Here is an example JSON file showing ManualRelationships
. The file contains one manual relationship, with one single column.
{ "DatabaseEngine": "SqlServer", "StartingTables" : [ { "Table": "[dbo].[Users]", "FilterClause": "[OrgId] = 1" } ], "ManualRelationships": [ { "ParentTable": "[dbo].[Orgs]", "ParentColumns": ["Id"], "ChildTable": "[dbo].[Users]", "ChildColumns": ["OrgId"] } ] }