Subsetting troubleshooting and known limitations
Published 16 October 2023
This area gives troubleshooting advice, and lists known limitations in the latest version of the subsetter CLI tool. This will be updated regularly to reflect ongoing changes.
Contents
All database engines
Source and target database need to use the same database engine
The source and target database both need to use the same database engine. For example, both could be SQL Server, or both could be PostgreSql. The subsetter would fail if, for instance, the source is SQL Server and the target is PostgreSql.
Source and target database need to be identical
The source and target databases must have exactly the same schema to ensure data consistency and proper functioning of the subsetter product.
Oracle connection caveat
The schema that is going to be subset needs to have a create session privilege in oracle.
Unsupported data types
Some column data types are not supported (or only partially so). This could cause subsetting to fail or it could cause an incomplete subset to be created (referential integrity will still be preserved).
For example, the HierarchyId SQL Server data type is only partially supported and the output subset may not contain the parents for the hierarchyid
data.
Self-references may be set to NULL
There is limited support for self-references (i.e. columns which reference a key in the same table). A subset will be produced, but the self-reference foreign key values may be set to NULL if the corresponding primary key value has not been included in the subset (please note that self reference columns will not affect which rows are included in the subset).
Cyclic references are not supported
The subsetter does not support databases with cyclical or circular references between tables. If you run it against a database with a cyclical reference, it will fail. As described above, there is limited support for self-references (i.e. columns which reference a key in the same table) which are a special case of cyclical references.
Triggers may interfere with subsetting
You may want to disable triggers in the target database before running the subsetter. Otherwise the triggers may fire as the subsetter inserts data into the target database, and this may have unwanted effects such as potentially causing subsetter to fail.
Console log level
When running on Docker it could be useful to increase the console log level so that debug messages go to the console.
Target database size deviates from specified value when using --target-size
Target size subsetting uses statistical methods to extract data at approximately the specified size. It is known that this feature performs more poorly on small databases (on the scale of MBs) due to higher statistical variance.
Troubleshooting tips for Sql Server specifically
SQL Server connection string pitfalls
Connection string attribute keywords
Auto keyword alias replacements in Subsetter 0.3.0.0+
Subsetter v0.3.0.0
and newer has improved connection string validation that will attempt to auto-replace unsupported keywords listed below by their supported equivalent aliases.
Some common connection string attribute keywords are not supported in the subsetter. Be sure to use the supported aliases as listed below.
- Use
server
rather thandata source
- Use
database
rather thaninitial catalog
- Use
Uid
rather thanUser
orUser Id
- Use
pwd
rather thanpassword
- Use
TrustServerCertificate
rather thanTrust Server Certificate
- Use
Trusted_Connection
rather thanIntegrated Security
As a rule of thumb, avoid keywords with spaces with them as they can be problematic. Here's an example of the type of errors that can happen:
Microsoft.Data.SqlClient.SqlException (0x80131904): Statement(s) could not be prepared.
Invalid object name 'dbo.Users'.
OLE DB provider "MSOLEDBSQL" for linked server "(null)" returned message "Deferred prepare could not be completed.
'TrustServerCertificate
' and 'Trusted_Connection
' values must be 'yes' or 'no'
Auto keyword value replacements in Subsetter 0.3.0.0+
Subsetter v0.3.0.0
and newer has improved connection string validation that will attempt to auto-replace unsupported values for TrustServerCertificate
and Trusted_Connection
listed below by their supported equivalent values.
The TrustServerCertificate
and Trusted_Connection
parts of the SQL Server connection string must have the value 'yes' or 'no'. The values 'true' and 'false' are not supported in subsetting, although they are supported more generally in connection strings.
Here is an example of the error message that can be produced by using an unsupported value:
Microsoft.Data.SqlClient.SqlException (0x80131904): Invalid value specified for connection string attribute 'TrustServerCertificate'
Specify the SQL Server instance
If your SQL Server instance is not the default instance on the host, you may need to specify the instance name. For example the connection string might need to include Server=myServerName\myInstanceName rather than simply Server=myServerName.
Error message "Invalid source connection string...Timed out... while trying to connect"
Sometimes the Subsetter may fail to connect to a Sql Server database with an error that includes the words "Invalid source connection string...Timed out... while trying to connect". An example is given below.
This can be caused if there is an error in the connection string attribute keyword for the Server. You should check the following.
- Check that the name or IP address is correct for the host.
- If your SQL Server instance is not the default instance on the host, specify the instance name as shown just above.
Invalid source connection string: Invalid SQL Server source connection string provided: Timed out after 00h 00m 05s.000ms while trying to connect.
Source and target columns must have the same collation
The subsetter validates that the source and target columns for the tables about to be subset are the same. This also includes their collation. If you get a validation error stating there are differences but you have the same columns, check their collation and ensure it matches.
The subsetter doesn't copy history tables
If the source database uses Sql Server's system versioned temporal tables then the subsetter will copy across the current data in the table, but the history will be ignored.