About Datasets
Published 06 January 2020
Datasets provide data values for Substitution, Insertion, Search and replace, XML masker, JSON Masker and Row-Internal Synchronization rules. The datasets associated with the columns defined in the rule indicate which type of data will be entered into the specified target table and column. A wide variety of datasets (see below) are available to provide a range of realistic looking data.
For example, a column containing customer last names could be "Masked" by implementing a Substitution rule on it using the Names, Surnames, Random dataset. When the Substitution rule is executed as part of the run of the masking set, random last names would be generated and substituted in for each real customer last name. Thus the true last name of the customer would be hidden (preserving privacy and security) but the remaining data would still be referentially relevant and usable as a test system.
A dataset is associated with the target tables column when the rule is created and can be changed at any time simply by editing the rule.
Datasets have options which provide further configuration information. Each dataset offers configuration options specific to its requirements. For example, the Dates, Random dataset offers the ability to set the starting and ending points of the date range.
The datasets are installed when the Data Masker software is installed. By default the datasets are stored in a directory named DataSets located below the Data Masker installation directory. The location of this directory can be changed through the use of the configuration options on the Misc. Setup Tab.
Overview of Datasets at Redgate University
See the What are DataSets, and can I make my own? module for a quick demo of where to find and use the built-in datasets, and how to create your own.
User Defined Datasets
If there is a requirement for a specific set of replacement data that is unavailable in the standard datasets it is quite possible to construct your own dataset. All that is required is to place a simple text file (with a special naming convention) in the datasets directory. Please see the User Defined Datasets help page for more information on how to build your own datasets.
Correlated Datasets
Most datasets generate only one replacement value each time, however, some datasets have the ability to generate a group of correlated replacement values. When we apply the dataset to multiple columns in a table, each group of replacement values will be applied together to a row, so that the relationship in multiple columns within the same row can be maintained.
A typical example of correlated datasets is correlated address, such as 'US Zipcodes with State, County and Town'. In the dataset, each group of correlated values represents a geographically valid address. It is possible to use this dataset on multiple address columns in a table and select the corresponding field for each column, so that the data in each row remains to be a valid address.
The Datasets
Listed below are the datasets currently available with the Data Masker software. More datasets are added all the time - and we are always interested in hearing new ideas. If you have a requirement which cannot be fulfilled by the datasets below please do let us know by emailing us at support@red-gate.com.
ABN, Australian Business Number (AUS)
ACN, Australian Company Number (AUS)
Medicare Numbers (AUS)
States, Australian
TFN, Tax File Numbers (AUS)
Postcodes + State + Town, AUS
Bank Account Numbers (NL)
Bank Account Numbers (NZ)
Burgerservicenummer (NL)
Credit Card Numbers, AMEX
Credit Card Numbers, Diners
Credit Card Numbers, Discover
Credit Card Expiration Dates
Credit Card Numbers, MasterCard
Credit Card Numbers, VISA
Postcodes + Prov. + Town, CDN
Postcodes, Canadian
Provinces, Canadian
SIN Numbers (Canadian)
SIN Numbers (Canadian)
Counties (IE)
Colours (Random)
Company Names
CPF Numbers (BR)
CPNJ Numbers (BR)
CPR Numbers (DK)
Country Names
Departments (FR)
E-Mail Addresses (Random)
NIF Numbers, (ES)
File, Upload From Disk
Numbers, Floating Point (Random, as Text)
Numbers, Integer (formatted)
Names, First Names, Female (DE)
Names, First Names, Female
Names, First Names, Female (FR)
Names, First Names, Female (NL)
Names, First Names, Female (PT)
Names, First Names, Female (ES)
NIR Numbers, (FR)
Postcodes, FR
Telephone Numbers, (FR)
Vehicle Registrations (FR)
HKID Numbers (HK)
Numbers, Integer, Sequential (as Text)
TCP/IP Addresses
Names, Surnames, Random (DE)
Names, Surnames, Random (Large List)
Names, Surnames, Random (Short List)
Names, Surnames, Random (FR)
Names, Surnames, Random (NL)
Names, Surnames, Random (PT)
Names, Surnames, Random (ES)
Numbers, Luhn, Mod(10) (as text)
MAC/EUI-48 Addresses
Names, First Names, Male+Female
Names, First Names, Male+Female (DE)
Names, First Names, Male+Female (FR)
Names, First Names, Male+Female (NL)
Names, First Names, Male+Female (PT)
Names, First Names, Male+Female (ES)
Names, First Names, Male (DE)
Names, First Names, Male
Names, First Names, Male (FR)
Names, First Names, Male (NL)
Names, First Names, Male (PT)
Names, First Names, Male (ES)
Month Names
Names, First+Last, Female
Names, First+Last, Male
Names, First+Last, Male+Female
Nationalities (Random)
NHS Numbers
NIF Numbers (PT)
Postcodes, NL
Names, Last+First, Female
Names, Last+First, Male+Female
Names, Last+First, Male
NRIC Numbers (SG)
NULL Values
Occupations
OHIP Numbers (Ontario)
Text, Paragraphs of Gibberish
PPS Numbers (IE)
Telephone Numbers, (North America)
Numbers, Integer (Random, as Text)
Text, Alpha-Numeric (Formatted)
Bit (Random)
Text, Dictionary Words
Names, First Names, Female (FR shortlist))
Names, Surnames, Random (FR shortlist))
Names, First Names, Male+Female (FR shortlist)
Names, First Names, Male (FR shortlist)
Street Addresses
Street Addresses (FR)
Street Addresses (IE)
Street Addresses (NL)
Names, Surname Suffixes
Names, Surname Titles (Short List)
Names, Surname Titles
SWIFT-BIC Codes
Telephone Numbers, (AUS)
Telephone Numbers, (CDN)
Town/City Names
Town/City Names (AU)
Town/City Names (FR)
Town/City Names (IE)
Town/City Names (NL)
Bank Sort Codes (UK)
Counties (UK)
NI Numbers (UK)
Postcodes, UK (Invalid)
Postcodes + Town + County, UK
Telephone Numbers, (UK)
Vehicle Registrations (UK)
URL's (Uniform Resource Locator)
SSN Numbers, (USA)
Counties (USA)
User Defined Correlated Dataset File
User Defined Dataset File
Text, User Specified
State Names (US)
Zip Codes, US (Invalid)
Zip Codes + State + County + Town, USA
DateOffset Variance
Date Variance (Random), Text
Date Variance (Hash Key), Text
Date Variance (Hash Key)
Date Variance (Correlated)
Date Variance (Constant)
DateOffsets, Sequential
DateOffset, User Specified
DateOffsets, Random
Date Variance (Random)
Dates, Sequential
Dates, Sequential (as Text)
Date, User Specified
Numbers, User Specified
Numbers, Floating Point (Random)
Numbers, Integer, Sequential
Numbers, Luhn, Mod(10)
Number Variance
Numbers, Integer (Random)
Dates, Random
Dates, Random (as Text)
Random Dates