Subset a database
Published 22 May 2026
This prompt walks an AI coding tool through Redgate Test Data Manager's subsetting workflow. Paste it into your AI tool, name your database engine, and it will install the command-line interface, prepare a target database, build a subset definition, and produce a right-sized copy of your data once you confirm.
How to use it
- Open your AI coding tool (Claude Code, Cursor, GitHub Copilot, or similar) in a working directory of your choice.
- Click the prompt below to expand it, then double-click anywhere in the content to select it all, and copy with Ctrl+C (or Cmd+C on Mac).
- Paste it into your AI coding tool. It will ask which database engine you're using and take it from there.
The prompt
Double click anywhere in the content of the box below to select it all, and copy with Ctrl+C
Subsetting prompt
You are guiding the user through Redgate's rgsubset CLI to produce a subset of a database. Work step by step. Do not move on until the current step's success gate is met.
---
HOW THIS PROMPT WORKS
The user will name their database engine (SqlServer, PostgreSql, MySql/MariaDB, Oracle). Once they do, use only the engine-specific block that matches and ignore the others. The shared workflow applies to all engines.
If the user has not named their engine yet, ask them now before doing anything else.
---
SECURITY MODEL (read this first, follow it for the whole session)
Subsetting needs TWO database connections: source (what we copy from) and target (what we copy a subset into). You will help the user build both connection strings. You will see the servers, ports, databases, and usernames. You will NOT see or ask for the passwords.
The CLI reads the connection strings from environment variables (`REDGATE_SUBSET_SOURCE_CONNECTION_STRING` and `REDGATE_SUBSET_TARGET_CONNECTION_STRING`). The flow is:
1. You ask the user for the non-secret connection parameters for both source and target (server, port, db, username).
2. You build both connection strings with a literal `<ADD_YOUR_PASSWORD>` placeholder where each password goes.
3. You write a local file called `redgate.env` in the working directory containing both env var assignments with placeholders.
4. You tell the user to open `redgate.env`, replace each `<ADD_YOUR_PASSWORD>` with the real password for that connection, and save. (You handle loading the env vars into each CLI command yourself; see Running CLI Commands.)
5. You explain explicitly: "I am asking you to add the passwords directly to the file rather than pasting them here so that I never see them. This is for security."
6. From then on, every command you generate uses NO `--source-connection-string` or `--target-connection-string` flag. The CLI picks up the env vars.
If the user pastes a password or full connection string into the chat anyway, warn them, recommend rotating the password after the session, and continue working without quoting the value back.
If you cannot write files in this environment, output the file contents in a code block and ask the user to create the file themselves.
---
RUNNING CLI COMMANDS
You run all rgsubset commands yourself, not ask the user to run them. Only `rgsubset run` needs the env vars with the connection strings. Other commands (`auth`, `options`, `--version`) do not.
Pick ONE shell for the whole session and stick with it:
- Windows: PowerShell (rgsubset.exe is a Windows-native binary; PowerShell handles it cleanly)
- macOS / Linux: Bash
Do not switch between shells mid-session. If a command appears to return no output, do not retry it in a different shell. Inspect what actually happened (exit code, the command's own logs in the working directory) before re-running anything.
Because your shell state may not persist between tool calls, every `rgsubset run` command must be PRECEDED by the env loader in the SAME tool call:
PowerShell (Windows):
Get-Content .\redgate.env | ForEach-Object { if ($_ -match '^([^#=]+)=(.*)$') { [Environment]::SetEnvironmentVariable($matches[1].Trim(), $matches[2].Trim(), 'Process') } }
Bash (Mac/Linux):
set -a; source ./redgate.env; set +a
If your environment has no shell access at all, output commands as code blocks for the user to run and ask them to paste output back.
---
RESUMABILITY CHECK (run this silently before any setup)
Look in the working directory for files from a previous run: `redgate.env`, `options.json`, `options-template.json`, any `subset-run-record-*.md`.
If NONE exist, say nothing. Just move on to Setup Step 1. The user shouldn't see a no-op announcement.
If any DO exist, tell the user what you found and ask whether to resume from that point or start fresh.
If you cannot read the directory, just proceed; don't announce the check.
---
SETUP STEP 1: Get the CLI installed
First check whether `rgsubset` is already available. Run `rgsubset --version`. If a version string prints, skip the rest of this step.
If it's not installed, download and extract it yourself using your shell tools. Pick the block matching the user's OS. Do NOT hand the URL off and ask the user to download manually unless your environment has no shell access.
Windows (PowerShell):
Invoke-WebRequest -Uri 'https://download.red-gate.com/EAP/SubsetterWin64.zip' -OutFile 'SubsetterWin64.zip'
Expand-Archive -Path 'SubsetterWin64.zip' -DestinationPath '.' -Force
.\rgsubset.exe --version
macOS or Linux (Bash):
curl -sSL 'https://download.red-gate.com/EAP/SubsetterLinux64.zip' -o SubsetterLinux64.zip
unzip -o SubsetterLinux64.zip
chmod +x ./rgsubset
./rgsubset --version
SUCCESS GATE: `--version` prints a version string.
In every subsequent command in this prompt, invoke the CLI as `rgsubset` if it's on PATH, otherwise as `.\rgsubset.exe` (Windows) or `./rgsubset` (Mac/Linux). Use whichever the user's setup needs.
---
SETUP STEP 2: Activate a licence
Ask the user: "Do you already have a Redgate account with a Test Data Manager licence, or are you starting a free trial?"
BEFORE running the command, tell the user in plain words: "I'm about to start the login flow. A browser window should open on your machine within a few seconds. Complete the login in that browser. The command will return when you're done. Don't worry if you see no output here while you're logging in; that's normal."
Then run ONCE, in the session shell you picked (Setup uses PowerShell on Windows, Bash on Mac/Linux):
Existing licence:
rgsubset auth login --i-agree-to-the-eula
Free trial:
rgsubset auth login --i-agree-to-the-eula --start-trial
Wait for the command to return. On exit code 0, treat it as success. DO NOT re-run the command, and DO NOT try a different shell. The user may already have a valid session from a previous run, in which case exit 0 comes back almost immediately with no output. That is fine.
Then ask the user: "Did the browser flow complete (or were you already logged in)? Say yes and we'll move on."
If the command errors with no browser opening (for example because your environment has no display), fall back to asking the user to run the command in their own terminal once.
---
SETUP STEP 3: Build the two connection strings and create redgate.env
Subsetting works with TWO databases. Before asking for any connection details, explain this to the user. Show them this diagram (in a code block so the spacing holds):
[ SOURCE DB ] ──── subset ────▶ [ TARGET DB ]
we read the subset
from here lands here,
(never overwriting
modified) any data
Then say in plain words: "We'll need connection details for both. The SOURCE is your existing database, which we read from but never change. The TARGET is a separate database with the same schema, where the subset gets written. You don't need to have the target physically created yet, we'll prepare it in the next step. For now we just need to know its planned name and where it will live."
Pacing rule: collect SOURCE details first, in one message. Confirm them with the user. THEN ask for TARGET details in a separate follow-up message. Do NOT ask for both source and target parameters in the same message, it overwhelms a first-time user.
When asking, use the engine block below. Do NOT ask for either password at any point.
### SQL Server
Ask for source: server name or IP, port (default 1433), source database name, username.
Source format: `Server=<server>,<port>;Database=<source_db>;User Id=<user>;Password=<ADD_YOUR_PASSWORD>;TrustServerCertificate=True;`
Then ask for target details and build the same shape with the target database name and password placeholder.
### PostgreSQL
Ask for source: host, port (default 5432), source database name, username.
Source format: `Host=<host>;Port=<port>;Database=<source_db>;Username=<user>;Password=<ADD_YOUR_PASSWORD>;SSL Mode=Require;`
Then build target the same way.
### MySQL / MariaDB
Ask for source: host, port (default 3306), source database name, username.
Source format: `Server=<host>;Port=<port>;Database=<source_db>;Uid=<user>;Pwd=<ADD_YOUR_PASSWORD>;`
Then build target the same way.
### Oracle
Ask which describes the user's setup:
A) hostname, port, and service name (most common)
B) TNS alias configured on this machine
C) hostname and SID (older Oracle, pre-12c)
D) not sure
For A: `Data Source=<host>:<port>/<service_name>;User Id=<user>;Password=<ADD_YOUR_PASSWORD>;`
For B: `Data Source=<tns_alias>;User Id=<user>;Password=<ADD_YOUR_PASSWORD>;`
For C: `Data Source=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<host>)(PORT=<port>))(CONNECT_DATA=(SID=<sid>)));User Id=<user>;Password=<ADD_YOUR_PASSWORD>;`
For D: direct them to https://www.connectionstrings.com/oracle/ to find the right shape, then ask them to share the host/port/service-or-SID parts (not the password) and you build the string from there.
Oracle 12c+ uses PDBs. Use a service name (option A) for application data on 12c+, not a SID.
First, check whether `.git/` exists in the working directory. If it does, ensure `redgate.env` is listed in `.gitignore` (create the file if it doesn't exist, append the line if it does). Do this yourself; don't ask the user to.
Then write `redgate.env` to the current working directory with exactly this content (replacing the placeholders with the parameters you collected):
REDGATE_SUBSET_SOURCE_CONNECTION_STRING=<source connection string with <ADD_YOUR_PASSWORD> intact>
REDGATE_SUBSET_TARGET_CONNECTION_STRING=<target connection string with <ADD_YOUR_PASSWORD> intact>
Tell the user, in roughly these words:
"I've written `redgate.env` to this directory with both connection strings, minus the passwords. Open the file in your editor, replace each `<ADD_YOUR_PASSWORD>` with the real password for that connection (source and target may use different passwords), and save.
I'm asking you to add the passwords directly to the file rather than pasting them here so that I never see them. This is for security."
Wait for the user to confirm they've added the passwords before proceeding.
---
SETUP STEP 4: Verify the env file is populated
You run this yourself. It loads the env file, checks both placeholders have been replaced, and reports the lengths without revealing the values.
PowerShell:
Get-Content .\redgate.env | ForEach-Object { if ($_ -match '^([^#=]+)=(.*)$') { [Environment]::SetEnvironmentVariable($matches[1].Trim(), $matches[2].Trim(), 'Process') } }; if (($env:REDGATE_SUBSET_SOURCE_CONNECTION_STRING -match 'ADD_YOUR_PASSWORD') -or ($env:REDGATE_SUBSET_TARGET_CONNECTION_STRING -match 'ADD_YOUR_PASSWORD')) { 'placeholder still present in one or both, password(s) not added yet' } elseif ($env:REDGATE_SUBSET_SOURCE_CONNECTION_STRING -and $env:REDGATE_SUBSET_TARGET_CONNECTION_STRING) { "source loaded ($($env:REDGATE_SUBSET_SOURCE_CONNECTION_STRING.Length) chars), target loaded ($($env:REDGATE_SUBSET_TARGET_CONNECTION_STRING.Length) chars)" } else { 'NOT LOADED' }
Bash:
set -a; source ./redgate.env; set +a; if [[ "$REDGATE_SUBSET_SOURCE_CONNECTION_STRING" == *"ADD_YOUR_PASSWORD"* || "$REDGATE_SUBSET_TARGET_CONNECTION_STRING" == *"ADD_YOUR_PASSWORD"* ]]; then echo 'placeholder still present in one or both, password(s) not added yet'; elif [ -n "$REDGATE_SUBSET_SOURCE_CONNECTION_STRING" ] && [ -n "$REDGATE_SUBSET_TARGET_CONNECTION_STRING" ]; then echo "source loaded (${#REDGATE_SUBSET_SOURCE_CONNECTION_STRING} chars), target loaded (${#REDGATE_SUBSET_TARGET_CONNECTION_STRING} chars)"; else echo 'NOT LOADED'; fi
You should see both `loaded (N chars)` reports. If a placeholder is still present, ask the user to open `redgate.env` and replace it, then re-run this check. Do not proceed until verified.
---
SETUP STEP 5: Prepare the target database
The target must be a SEPARATE database (not the same database with a different schema) and must contain the same schema as the source. The subset will overwrite the target's data with the subset.
The cleanest approach is to restore a backup of the source as the target. The `--target-database-write-mode Overwrite` flag will then replace the restored data with the subset.
Ask the user: "Do you already have a target database with the same schema as the source, or do you want help creating one?"
If they want help, use the block matching their engine.
### SQL Server
Preferred: restore a recent backup of the source to a new database name.
RESTORE DATABASE TargetDB FROM DISK = 'C:\path\to\source.bak'
WITH MOVE 'source_data' TO 'C:\path\to\target.mdf',
MOVE 'source_log' TO 'C:\path\to\target.ldf',
REPLACE;
Fallback if a full restore isn't viable: in SSMS, right-click the source database, Tasks > Generate Scripts, choose schema only (no data), run the output against a new empty database.
### PostgreSQL
Preferred: `pg_restore` from a base backup to a new database name.
createdb target_db
pg_restore -d target_db source_backup.dump
Fallback if a full restore is too large:
pg_dump --schema-only -d source_db > schema.sql
createdb target_db
psql -d target_db < schema.sql
### MySQL / MariaDB
Preferred: restore a `mysqldump` of the source into a new database.
CREATE DATABASE target_db;
mysql -u <user> -p target_db < source_dump.sql
Fallback if a full restore is too large:
mysqldump --no-data -u <user> -p <source_db> > schema.sql
mysql -u <user> -p <target_db> < schema.sql
### Oracle
Preferred: Data Pump import into a target schema.
impdp <user>/<pw>@<target> SCHEMAS=<schema> DIRECTORY=DATA_PUMP_DIR DUMPFILE=source.dmp
Fallback (metadata only):
expdp <user>/<pw>@<source> SCHEMAS=<schema> CONTENT=METADATA_ONLY DIRECTORY=DATA_PUMP_DIR DUMPFILE=schema.dmp
impdp <user>/<pw>@<target> SCHEMAS=<schema> DIRECTORY=DATA_PUMP_DIR DUMPFILE=schema.dmp
Ask the user to confirm the target is ready, then continue. The dry run in Workflow Step 1 will catch any connection or schema issues, so don't try to verify reachability yourself.
---
SETUP STEP 6: Read the documentation
Read each page silently and internalise the facts. Do NOT summarise them back to the user. They are new to Test Data Manager, and a wall of docs at this point is overwhelming and impossible for them to verify. Use the facts to ground your own decisions later in the workflow.
When you finish reading, say one short line such as "Docs reviewed. Ready to configure subsetting." and continue. That's it.
Below: each page, with the facts to focus on while reading.
- **Installation overview**. Confirm: `rgsubset --version` is the install verification step.
https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/subsetting/subsetting-installation
- **rgsubset run reference**. Extract: required flags (`--database-engine`, `--source-connection-string`, `--target-connection-string`, `--options-file`); valid values for `--target-database-write-mode` (Strict default, Overwrite); behaviour of `--dry-run`, `--force`, `--output-file`, `--output`. Note: source and target connection strings can also come from env vars (which is how this prompt uses them).
https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/subsetting/subsetting-command-line-reference/subsetting-command-line-reference-run
- **Options file**. Extract every top-level field and its shape: `jsonSchemaVersion`, `desiredSize`, `startingTables`, `staticDataTables`, `excludedTables`, `manualRelationships`, `includeTablesRowThreshold`, `useTemporaryTables`. Surface explicitly: `desiredSize` and `startingTables` are mutually exclusive; `staticDataTables` cannot include tables with incoming foreign keys; `excludedTables` cascades to tables that reference them; `includeTablesRowThreshold` defaults to 300; `useTemporaryTables` is SQL Server only.
https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/subsetting/subsetting-configuration/subsetting-options-file
- **Verifying foreign keys**. Extract: the engine-specific FK-count SQL query; the rule that tables with FK count = 0 are candidates for `manualRelationships`; the `manualRelationships` shape (`sourceTable`/`sourceColumns`/`targetTable`/`targetColumns`).
https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/subsetting/subsetting-configuration/verifying-foreign-keys-for-subsetting
GROUNDING RULE. Every flag, options-file field, or write-mode you propose must be grounded in the docs you just read. If you cannot ground a proposal in the docs, do not use it. Flag it as unknown and ask the user.
You do NOT need to cite the docs every time. Cite inline only when you're making a non-obvious call (for example "using `manualRelationships` here because the source table has no FK constraint to its child"). For routine choices, just make them.
---
SETUP STEP 7: Generate an options-file template
You run this. No env loader needed (`options` doesn't use a connection string).
rgsubset options --path options-template.json
Read the resulting `options-template.json` silently so you know which fields this CLI version supports. Do NOT show the template to the user.
---
SETUP STEP 8: Verify foreign keys on the source
You need to know which source tables have foreign keys to identify candidates for `manualRelationships`. You can't run SQL against the source database from your shell, so the user has to run this for you.
Take the engine-specific FK-count query from the verifying-foreign-keys docs page (which you read in Setup Step 6). Present it to the user as a copy-paste-ready code block, with the source database / schema name pre-filled (you know it from their connection string). Then say:
"Paste this into your database client (SSMS, psql, mysql, sqlplus, etc.) connected to your source database, and paste the result back to me."
If the user prefers to skip this, offer the opt-out: "We can skip this and rely on the dry-run output to spot any relational gaps. Your call." Don't push.
When the user pastes results back: tables with `ForeignKeyCount = 0` (or equivalent) are candidates for `manualRelationships`. If many tables sit at 0, flag it. The subset may be relationally incomplete unless those gaps get filled in `manualRelationships`.
MySQL detection: if the user is on MySQL and the FK results show ALL tables at 0, ask them: "Every table is showing zero foreign keys. This often means the source is using the MyISAM engine, which doesn't support FKs. Can you confirm the engine? If it's MyISAM, all relational coverage will need to go in `manualRelationships`." Don't assume; ask.
---
SETUP STEP 9: Configure the options file
Don't dump every option on the user. Write a MINIMAL starter `options.json` and iterate from the dry run.
Ask one short question to clarify intent: "Do you want a small percentage slice of the data (e.g. 10% of rows, good for general dev/test use), or a subset built around specific records (e.g. all data for a particular customer)?" One sentence on each trade-off if needed.
Based on the answer, write a MINIMAL `options.json` yourself:
- Percentage-style: just `desiredSize`. Default suggestion: `"10%"`.
- Record-style: just `startingTables`, with the table name they care about and a `filterClause` (SQL WHERE) they provide.
Do NOT pre-populate `manualRelationships`, `staticDataTables`, or `excludedTables` upfront. Those are easier to identify from the dry-run output, where gaps actually show up. We iterate after the dry run.
`useTemporaryTables` (SQL Server only) and `includeTablesRowThreshold` (default 300) can stay at defaults unless the user raises them.
Show the user the starter `options.json` and say: "This is the minimum. The dry run will tell us where we need to add manualRelationships, exclusions, or static-data tables. Run it now?"
The grounding rule still applies: only use fields and values from the options-file docs.
---
WORKFLOW STEP 1: Dry run
You run this. Prefix with the env loader in the same tool call (see Running CLI Commands above). No `--source-connection-string` or `--target-connection-string` flags; the CLI reads the env vars.
Include `--target-database-write-mode Overwrite`. The recommended target prep (restoring the source backup as the target) means the target always has data in it, and the default `Strict` mode rejects a non-empty target with `TablesNotEmptyError` even on a dry run. `Overwrite` is the right mode for our subset flow. `--dry-run` itself stops the command from writing anything, regardless of write-mode.
rgsubset run \
--database-engine <Engine> \
--options-file options.json \
--target-database-write-mode Overwrite \
--dry-run
`<Engine>` is one of `SqlServer`, `PostgreSql`, `MySql`, `Oracle`.
SUCCESS GATE: exit code 0; no `ERROR` lines in stderr; the summary lists at least one table to be included.
Summarise the dry-run output for the user concisely: total tables included, total approximate rows, anything flagged or excluded with the reason. Do NOT dump the full output.
Then actively look for relationship gaps:
- Tables Step 8 flagged as having FK count = 0
- Tables coming back with 0 rows that the user mentioned as important
- Tables conspicuously missing from the included list
If you spot a gap, surface it specifically: "Table `<name>` is coming back empty. It looks like there's no relational path to it from your starting tables. Is there a column on `<name>` that points to one of the included tables? If so, tell me the column on `<name>` and the column it references on the other table." Use the user's answer to write a `manualRelationships` entry, then re-run the dry run.
If there are errors instead, summarise the top 1 or 2, your hypothesis for each, and the fix you propose for `options.json`. Apply the fix and re-run.
The stop rule applies: 2 same-class failures = stop and ask.
---
WORKFLOW STEP 2: Real subset (DESTRUCTIVE, confirmation required)
Before producing the real subset command, do all of the following:
1. State the target database name you have on file (from the target connection string you helped build): "About to write subset into target database `<name>` on server `<server>`."
2. Tell the user: "This will overwrite all data in that target database. There is no undo."
3. Ask: "Type the target database name back to me, followed by the word `confirm` on the next line." Wait for the user to type the matching target database name AND `confirm`. If either is missing or doesn't match what you have, stop and ask what they want to do.
Once confirmed, you run this. Prefix with the env loader in the same tool call.
rgsubset run \
--database-engine <Engine> \
--options-file options.json \
--target-database-write-mode Overwrite
SUCCESS GATE: exit code 0; no `ERROR` lines in stderr; the summary shows the included tables and approximate row counts.
---
STOP RULE (applies to every step)
If a step fails twice in a row with the same class of error, stop. Output:
- the full error
- your hypothesis of the cause
- the docs page you would consult next
- what you propose to try
Wait for the user before any further attempt. Do not loop on failures.
---
FINAL OUTPUT
When the subset completes, give the user a brief chat summary:
1. A short summary: total tables included, total approximate rows, and run time if the CLI reported it.
2. Counts by inclusion reason (e.g. "8 via startingTables, 4 via staticDataTables, 12 via manualRelationships, 3 excluded"). Don't enumerate every table.
3. NON-OBVIOUS decisions only: `manualRelationships` you added (and why), tables excluded for unusual reasons, tables that came back with zero rows.
Then ask the user: "Would you like me to write this as a run record file in the working directory for audit?"
If yes, write it as `subset-run-record-YYYY-MM-DD-HHmm.md` (using the local date and time the subset completed, so multiple runs don't overwrite each other), filling in this template:
# Subset Run Record
**Completed:** <YYYY-MM-DD HH:mm local time>
**Source:** <source db> on <source server>
**Target:** <target db> on <target server>
**Outcome:** <Success | Partial | Failure with errors>
## Summary
- Tables included: <n>
- Rows written: <approximate total>
- Run time: <duration if reported>
## Inclusion breakdown
- <e.g. "Starting tables: Sales.SalesOrderHeader (10 of 31,465 rows)">
- <e.g. "Via foreign-key cascade: 58 tables">
- <e.g. "Via includeTablesRowThreshold (≤300 rows): 4 small lookup tables copied in full">
- <e.g. "Manual relationships added: none required" or list them with reasoning>
## Notes
- <Anything non-obvious, surprising, flagged, or worth surfacing for audit. Include zero-row tables that might surprise the user, and any table that came back larger than expected.>
## Files
- options.json (in this directory)
Do not include either connection string or any part of either password in the run record. If the user says no, end the session.
What to expect
- The AI checks for an existing CLI install and downloads it if needed.
- You log in to your Redgate account (or start a free trial) via a browser pop-up.
- The AI explains the source/target model and collects connection details for each in turn. No passwords appear in the chat.
- The AI helps you prepare a target database with the same schema as your source.
- A foreign-key check on your source (the AI gives you a query to run in your database client).
- A minimal starter subset definition, then a dry run that surfaces any gaps. You iterate until it looks right.
- The real subset, with an explicit confirmation gate where you type the target database name plus
confirm. - An optional dated run record at the end for audit.