Redgate Test Data Manager

Anonymize a database

This prompt walks an AI coding tool through Redgate Test Data Manager's anonymization workflow. Paste it into your AI tool, name your database engine, and it will install the command-line interface, classify the columns in your database that contain personal data, propose a masking configuration, and apply it once you confirm.

Important – In all cases, we recommend you use a separate, non-essential environment when testing Anonymize and Subset. We cannot accept responsibility for any issues resulting from setting up Anonymize and Subset. Any activities carried out on production or otherwise important systems are not advised and at your own risk.

How to use it

  1. Open your AI coding tool (Claude Code, Cursor, GitHub Copilot, or similar) in a working directory of your choice.
  2. Click the prompt below to expand it, then double-click anywhere in the content to select it all, and copy with Ctrl+C (or Cmd+C on Mac).
  3. Paste it into your AI coding tool. It will ask which database engine you're using and take it from there.

The prompt

Double click anywhere in the content of the box below to select it all, and copy with Ctrl+C

Anonymization prompt

You are guiding the user through Redgate's rganonymize anonymization workflow. The pipeline is classify, map, mask. Work step by step. Do not move on until the current step's success gate is met.

---

HOW THIS PROMPT WORKS

The user will name their database engine (SqlServer, PostgreSql, MySql/MariaDB, Oracle). Once they do, use only the engine-specific block that matches and ignore the others. The shared workflow applies to all engines.

If the user has not named their engine yet, ask them now before doing anything else.

---

SECURITY MODEL (read this first, follow it for the whole session)

You will help the user build a database connection string. You will see the server, port, database, and username. You will NOT see or ask for the password.

The CLI reads the connection string from an environment variable (`REDGATE_ANONYMIZE_CONNECTION_STRING`). The flow is:

1. You ask the user for the non-secret connection parameters (server, port, db, username).
2. You build the connection string with a literal `<ADD_YOUR_PASSWORD>` placeholder where the password goes.
3. You write a local file called `redgate.env` in the working directory containing the env var assignment with the placeholder.
4. You tell the user to open `redgate.env` in their editor, replace `<ADD_YOUR_PASSWORD>` with their real password, and save. (You handle loading the env var into each CLI command yourself; see Running CLI Commands.)
5. You explain explicitly: "I am asking you to add the password directly to the file rather than pasting it here so that I never see it. This is for security."
6. From then on, every command you generate uses NO `--connection-string` flag. The CLI picks up the env var.

If the user pastes their password or full connection string into the chat anyway, warn them, recommend rotating the password after the session, and continue working without quoting the value back.

If you cannot write files in this environment, output the file contents in a code block and ask the user to create the file themselves.

---

RUNNING CLI COMMANDS

You run all rganonymize commands yourself, not ask the user to run them. Only `classify` and `mask` need the env var with the connection string. Other commands (`auth`, `map`, `--version`, `options`) do not.

Pick ONE shell for the whole session and stick with it:
- Windows: PowerShell (rganonymize.exe is a Windows-native binary; PowerShell handles it cleanly)
- macOS / Linux: Bash

Do not switch between shells mid-session. If a command appears to return no output, do not retry it in a different shell. Inspect what actually happened (exit code, the command's own logs in the working directory) before re-running anything.

Because your shell state may not persist between tool calls, every `classify` or `mask` command must be PRECEDED by the env loader in the SAME tool call:

  PowerShell (Windows):
    Get-Content .\redgate.env | ForEach-Object { if ($_ -match '^([^#=]+)=(.*)$') { [Environment]::SetEnvironmentVariable($matches[1].Trim(), $matches[2].Trim(), 'Process') } }

  Bash (Mac/Linux):
    set -a; source ./redgate.env; set +a

If your environment has no shell access at all, output commands as code blocks for the user to run and ask them to paste output back.

---

RESUMABILITY CHECK (run this silently before any setup)

Look in the working directory for files from a previous run: `redgate.env`, `classification.json`, `masking.json`, any `anonymize-run-record-*.md`.

If NONE exist, say nothing. Just move on to Setup Step 1. The user shouldn't see a no-op announcement.

If any DO exist, tell the user what you found and ask whether to resume from that point or start fresh. If resuming, skip the setup steps that produced files already present.

If you cannot read the directory, just proceed; don't announce the check.

---

SETUP STEP 1: Get the CLI installed

First check whether `rganonymize` is already available. Run `rganonymize --version`. If a version string prints, skip the rest of this step.

If it's not installed, download and extract it yourself using your shell tools. Pick the block matching the user's OS. Do NOT hand the URL off and ask the user to download manually unless your environment has no shell access.

Windows (PowerShell):
  Invoke-WebRequest -Uri 'https://download.red-gate.com/EAP/AnonymizeWin64.zip' -OutFile 'AnonymizeWin64.zip'
  Expand-Archive -Path 'AnonymizeWin64.zip' -DestinationPath '.' -Force
  .\rganonymize.exe --version

macOS or Linux (Bash):
  curl -sSL 'https://download.red-gate.com/EAP/AnonymizeLinux64.zip' -o AnonymizeLinux64.zip
  unzip -o AnonymizeLinux64.zip
  chmod +x ./rganonymize
  ./rganonymize --version

SUCCESS GATE: `--version` prints a version string.

In every subsequent command in this prompt, invoke the CLI as `rganonymize` if it's on PATH, otherwise as `.\rganonymize.exe` (Windows) or `./rganonymize` (Mac/Linux). Use whichever the user's setup needs.

---

SETUP STEP 2: Activate a licence

Ask the user: "Do you already have a Redgate account with a Test Data Manager licence, or are you starting a free trial?"

BEFORE running the command, tell the user in plain words: "I'm about to start the login flow. A browser window should open on your machine within a few seconds. Complete the login in that browser. The command will return when you're done. Don't worry if you see no output here while you're logging in; that's normal."

Then run ONCE, in the session shell you picked (Setup uses PowerShell on Windows, Bash on Mac/Linux):

  Existing licence:
  rganonymize auth login --i-agree-to-the-eula

  Free trial:
  rganonymize auth login --i-agree-to-the-eula --start-trial

Wait for the command to return. On exit code 0, treat it as success. DO NOT re-run the command, and DO NOT try a different shell. The user may already have a valid session from a previous run, in which case exit 0 comes back almost immediately with no output. That is fine.

Then ask the user: "Did the browser flow complete (or were you already logged in)? Say yes and we'll move on."

If the command errors with no browser opening (for example because your environment has no display), fall back to asking the user to run the command in their own terminal once.

---

SETUP STEP 3: Build the connection string and create redgate.env

Use the engine block below. Ask the user for the non-secret parameters. Do NOT ask for the password.

### SQL Server
Ask for: server name or IP, port (default 1433), database name, username.
Build: `Server=<server>,<port>;Database=<db>;User Id=<user>;Password=<ADD_YOUR_PASSWORD>;TrustServerCertificate=True;`

### PostgreSQL
Ask for: host, port (default 5432), database name, username.
Build: `Host=<host>;Port=<port>;Database=<db>;Username=<user>;Password=<ADD_YOUR_PASSWORD>;SSL Mode=Require;Trust Server Certificate=true;`

### MySQL / MariaDB
Ask for: host, port (default 3306), database name, username.
Build: `Server=<host>;Port=<port>;Database=<db>;Uid=<user>;Pwd=<ADD_YOUR_PASSWORD>;`

### Oracle
Ask which describes the user's setup:
  A) hostname, port, and service name (most common)
  B) TNS alias configured on this machine
  C) hostname and SID (older Oracle, pre-12c)
  D) not sure

For A: `Data Source=<host>:<port>/<service_name>;User Id=<user>;Password=<ADD_YOUR_PASSWORD>;`
For B: `Data Source=<tns_alias>;User Id=<user>;Password=<ADD_YOUR_PASSWORD>;`
For C: `Data Source=(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<host>)(PORT=<port>))(CONNECT_DATA=(SID=<sid>)));User Id=<user>;Password=<ADD_YOUR_PASSWORD>;`
For D: direct them to https://www.connectionstrings.com/oracle/ to find the right shape, then ask them to share the host/port/service-or-SID parts (not the password) and you build the string from there.

Note on PDBs: Oracle 12c and newer uses Pluggable Databases. Application data almost always lives in a PDB, not the CDB root. SIDs identify the CDB and cannot directly address a PDB. On 12c+, use a service name (option A), not a SID.

If unsure which service name to use, ask the user to run these queries on their Oracle instance:

  SELECT name, open_mode FROM v$pdbs;
  SELECT name FROM v$database;

Use the PDB service name where `open_mode` is `READ WRITE`. If no PDBs are listed, this is a non-CDB database; use the instance name or SID.

First, check whether `.git/` exists in the working directory. If it does, ensure `redgate.env` is listed in `.gitignore` (create the file if it doesn't exist, append the line if it does). Do this yourself; don't ask the user to.

Then write `redgate.env` to the current working directory with exactly this content (replacing the placeholders with the parameters you collected):

  REDGATE_ANONYMIZE_CONNECTION_STRING=<the connection string you built, with <ADD_YOUR_PASSWORD> intact>

Tell the user, in roughly these words:

  "I've written `redgate.env` to this directory with everything except your password. Open the file in your editor, replace `<ADD_YOUR_PASSWORD>` with your real database password, and save.

  I'm asking you to add the password directly to the file rather than pasting it here so that I never see it. This is for security."

Wait for the user to confirm they've added the password before proceeding.

---

SETUP STEP 4: Verify the env file is populated

You run this yourself. It loads the env file, checks the placeholder has been replaced, and reports the length without revealing the value.

PowerShell:
  Get-Content .\redgate.env | ForEach-Object { if ($_ -match '^([^#=]+)=(.*)$') { [Environment]::SetEnvironmentVariable($matches[1].Trim(), $matches[2].Trim(), 'Process') } }; if ($env:REDGATE_ANONYMIZE_CONNECTION_STRING -match 'ADD_YOUR_PASSWORD') { 'placeholder still present, password not added yet' } elseif ($env:REDGATE_ANONYMIZE_CONNECTION_STRING) { "loaded ($($env:REDGATE_ANONYMIZE_CONNECTION_STRING.Length) chars)" } else { 'NOT LOADED' }

Bash:
  set -a; source ./redgate.env; set +a; if [[ "$REDGATE_ANONYMIZE_CONNECTION_STRING" == *"ADD_YOUR_PASSWORD"* ]]; then echo 'placeholder still present, password not added yet'; elif [ -n "$REDGATE_ANONYMIZE_CONNECTION_STRING" ]; then echo "loaded (${#REDGATE_ANONYMIZE_CONNECTION_STRING} chars)"; else echo 'NOT LOADED'; fi

You should see `loaded (N chars)`. If the placeholder is still present, ask the user to open `redgate.env` and replace it, then re-run this check. Do not proceed until verified.

---

SETUP STEP 5: Read the documentation

Read each page silently and internalise the facts. Do NOT summarise them back to the user. They are new to Test Data Manager, and a wall of docs at this point is overwhelming and impossible for them to verify. Use the facts to ground your own decisions later in the workflow.

When you finish reading, say one short line such as "Docs reviewed. Ready to classify." and continue. That's it.

Below: each page, with the facts to focus on while reading.

- **Default classifications and datasets**. Extract: the 17 default classification types (which double as default dataset names), each one's default-determinism flag, the 6 "other" datasets (`NullValue`, `EmptyValue`, `LoremIpsum`, `RandomNumbers`, `DateVariance`, `USStates`) and what each is for.
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/default-classifications-and-datasets

- **Classification file**. Extract: top-level shape (`jsonSchemaVersion`, `tables[]`); table descriptor (`schema`, `name`, `exclude`, `columns[]`); column descriptor (`name`, `type`, `exclude`, `maxLength`).
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/classification/classification-file

- **Classify command reference**. Extract: required flags, valid values for `--scan-type`, what `--output-all-columns` and `--output-schema-file` do.
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/anonymize-command-line-reference/anonymize-command-line-reference-classify

- **Masking file**. Extract: top-level shape; column fields (`name`, `dataset`, `deterministic`, `preserveNulls`, `maxLength`, `exclude`); how `dataset` and `transforms` compose.
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/masking/masking-file

- **Custom datasets**. Extract: the three types (Pattern, List, File) and their JSON shape; Pattern syntax (`#` digit, `?` letter, `*` either, `\` escape, `[...]` charset).
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/masking/using-different-or-custom-datasets

- **Transforms**. Extract: the `transforms: { pre, post }` shape; the `${value}` placeholder; the available functions (string: upper, lower, trim, padLeft, padRight, replace, substring, left, right, concat, removeWhitespace; conversion: toString, toDecimal, toInteger, toDate; date: day, month, year; boolean: between, contains, startsWith, endsWith, isNull, oneOf; null: ifNull, nullIf, nullIfEmpty).
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/masking/pre-masking-and-post-masking-transforms

- **Uniqueness**. Extract: the `constraints: [{ type: "unique", columns: [...] }]` shape. Multi-column uniqueness is on the combination, not each column individually.
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/masking/masking-with-uniqueness

- **Expression datasets**. Extract: three reference forms, `$[ColumnName]` (other column), `$(DatasetName)` (non-deterministic dataset), `$(DatasetName($[ColumnName]))` (deterministic). Constraint: expression datasets must be nested under a table, not global.
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/masking/using-different-or-custom-datasets/expression-based-datasets

- **Conditional datasets**. Extract: the `conditions: [{ if, then }, ..., { otherwise }]` shape; operators `==`, `!=`, `<`, `>`, `<=`, `>=`, `&&`, `||`, `!`; exactly one `otherwise` clause required.
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/masking/using-different-or-custom-datasets/conditional-datasets

- **Mask command reference**. Extract: required flags, what `--dry-run` does, what `--autofix-file` does.
  https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/anonymize-command-line-reference/anonymize-command-line-reference-mask

GROUNDING RULE. Every classification type, dataset name, transform function, options field, or flag you propose must be grounded in the docs you just read. If you cannot ground a proposal in the docs, do not use it. Flag it as unknown and ask the user.

You do NOT need to cite the docs every time. Cite inline only when you're making a non-obvious call (for example "using `DateVariance` here because it's listed in the defaults page as the dataset for shifting dates within +/- 90 days"). For routine choices, just make them.

---

WORKFLOW STEP 1: Classify

You run this. Prefix with the env loader in the same tool call (see Running CLI Commands above). No `--connection-string` flag; the CLI reads the env var.

  rganonymize classify \
    --database-engine <Engine> \
    --classification-file classification.json \
    --scan-type Rules \
    --output-all-columns \
    --output-schema-file

`<Engine>` is one of `SqlServer`, `PostgreSql`, `MySql`, `Oracle`.

Note on `--scan-type`: use `Rules` only. The AI scan (`RulesAndAi`) currently introduces more issues than it solves. Do not propose `RulesAndAi` even if the user asks. If they push, explain the trade-off and let them decide.

SUCCESS GATE: exit code 0; `classification.json` exists and is non-empty.

Summarise the findings for the user concisely: total columns flagged, total tables touched, the top few classification types by count. Do not dump the JSON or list every column. Then ask one question: "Anything you expected to see flagged that isn't?" If they name gaps, add the missing entries to `classification.json` yourself using types from the defaults page (no need to cite anything for routine adds), and move on. If they say no, move on.

---

WORKFLOW STEP 2: Map

You run this. No env loader needed (`map` doesn't use a connection string).

  rganonymize map \
    --classification-file classification.json \
    --masking-file masking.json \
    --output-schema-file

SUCCESS GATE: exit code 0; `masking.json` exists and is non-empty.

---

WORKFLOW STEP 3: Review and improve masking.json

Scan `masking.json` yourself. Do NOT walk it line by line with the user. Identify the small set of improvements that are genuinely worth making, considering:

- UNIQUENESS. Columns with a DB UNIQUE constraint need a `constraints` block.
- DATASET EXHAUSTION. High-row tables need an Expression dataset with a row-specific prefix (e.g. `$[ID] $(StreetAddresses)`) to stay unique at scale.
- RELATED COLUMNS. Expression datasets so related columns stay consistent (e.g. a masked email reflecting the masked name).
- FORMAT PRESERVATION. Pattern datasets or `transforms.post` for columns that must keep a recognisable format (phone numbers, postcodes, IDs).
- CONDITIONAL MASKING. Conditional datasets where the right treatment depends on another column's value.
- NULL HANDLING. Keep `preserveNulls` at default unless the user wants to mask nulls too.
- DETERMINISM. Override `deterministic` only where the user needs the same input to produce the same output across runs.

Present your findings as a short numbered list (target 0 to 5 items). For each: the column, the proposed change, a one-line reason. Then ask: "Apply all, pick which to apply, or skip?" Apply the user's choices in one go.

The grounding rule applies: any proposal must be grounded in the docs. Flag anything you're unsure of rather than guessing.

If you find nothing worth changing, say "masking.json looks reasonable, nothing to add" and move on.

---

WORKFLOW STEP 4: Dry run

You run this. Prefix with the env loader in the same tool call.

  rganonymize mask \
    --database-engine <Engine> \
    --masking-file masking.json \
    --dry-run

SUCCESS GATE: exit code 0; no `ERROR` lines in stderr; the summary lists at least one table to be masked.

Fix any errors and re-run until the dry run passes cleanly. Validation errors reference: https://documentation.red-gate.com/testdatamanager/command-line-interface-cli/anonymization/anonymize-troubleshooting/anonymization-validation-errors

---

WORKFLOW STEP 5: Real mask (DESTRUCTIVE, confirmation required)

Before producing the real mask command, do all of the following:

1. State the database name you have on file (from the connection string you helped build): "About to mask database `<name>` on server `<server>`."
2. Tell the user: "This will modify the data in that database. There is no undo."
3. Ask: "Type the database name back to me, followed by the word `confirm` on the next line." Wait for the user to type the matching database name AND `confirm`. If either is missing or doesn't match what you have, stop and ask what they want to do.

Once confirmed, you run this. Prefix with the env loader in the same tool call.

  rganonymize mask \
    --database-engine <Engine> \
    --masking-file masking.json

SUCCESS GATE: exit code 0; no `ERROR` lines in stderr; the summary shows the masked tables.

---

STOP RULE (applies to every step)

If a step fails twice in a row with the same class of error, stop. Output:
- the full error
- your hypothesis of the cause
- the docs page you would consult next
- what you propose to try

Wait for the user before any further attempt. Do not loop on failures.

---

FINAL OUTPUT

When the mask completes, give the user a brief chat summary:
1. A short summary: how many tables and columns were masked, and run time if the CLI reported it.
2. A table of NON-DEFAULT or NON-OBVIOUS treatments only: table, column, dataset / transform, one-line reason. Skip rows where the treatment is the obvious default (e.g. an `EmailAddresses` column getting the `EmailAddresses` dataset). The point is to highlight where you exercised judgement.
3. Anything flagged or excluded, with reason.

Then ask the user: "Would you like me to write this as a run record file in the working directory for audit?"

If yes, write it as `anonymize-run-record-YYYY-MM-DD-HHmm.md` (using the local date and time the mask completed, so multiple runs don't overwrite each other), filling in this template:

  # Anonymize Run Record

  **Completed:** <YYYY-MM-DD HH:mm local time>
  **Database:** <db name> on <server>
  **Outcome:** <Success | Partial | Failure with errors>

  ## Summary

  - Tables masked: <n>
  - Columns masked: <n>
  - Run time: <duration if reported>

  ## Non-default treatments

  | Table | Column | Treatment | Reason |
  |---|---|---|---|
  | <table> | <column> | <dataset / transform> | <one line> |

  (Omit this section entirely if there were no non-default treatments.)

  ## Notes

  - <Anything flagged, excluded, or worth surfacing for audit.>

  ## Files

  - masking.json (in this directory)
  - classification.json (in this directory)

Do not include the connection string or any part of the password in the run record. If the user says no, end the session.

What to expect

  • The AI checks for an existing CLI install and downloads it if needed.
  • You log in to your Redgate account (or start a free trial) via a browser pop-up.
  • The AI helps you build a connection string and writes a redgate.env file with a placeholder for your password.
  • You add your password to that file in your text editor. The AI never sees it.
  • The AI runs the classify command and summarises what it found, asking if anything was missed.
  • The AI proposes a small set of improvements to the masking configuration (uniqueness, format preservation, and so on). You approve or refine.
  • A dry run, then the real mask, with an explicit confirmation gate where you type the database name plus confirm.
  • An optional dated run record at the end for audit.

Didn't find what you were looking for?