About User Defined Datasets
Published 19 March 2018
If there is a requirement for a specific set of replacement data that is unavailable in the standard datasets it is quite possible to construct your own dataset. Building your own dataset is quite simple - here is how it is done:
- Create the data as a standard text file with one item per line.
- Give the text file any name you wish (see note 3 and 4 below) but use the extension .udef
- Place the file in the DataSets directory along with the standard datasets.
Some Notes on User Defined Datasets
- Numerous datasets suitable for a wide variety of purposes ship with the Data Masker software and are installed when the Data Masker software is installed. By default the datasets are stored in a directory named DataSets located below the Data Masker installation directory. The location of this directory can be changed through the use of the configuration options on the Misc. Setup Tab. Any user defined datasets which are created must be placed in the same directory as the Data Masker supplied datasets. If this is not done the Data Masker software will not be able to configure any rules to use it.
- Be careful not to allow any blank lines (particularly at the very end of the file) in the text file. The Data Masker will happily accept an empty line as a valid dataset value and will insert or substitute it as with any other data.
- Be careful to trim off any leading and trailing space characters from each line. The Data Masker will use the data "as-is".
- The text file containing the data must have an extension of .udef
- Any underscore "_" characters in the file name will be replaced by spaces when the name appears in the dataset list inside the Data Masker software. Thus a filename of Endangered_Mammals.udef will appear on the display as Endangered Mammals.
- As with all datasets, you do not need to worry about field length considerations. For example, a rule applied to a varchar(20) field will not generate errors if there are some lines in the user defined dataset that are longer than 20 characters. The Data Masker will ignore the lines in the user defined dataset which are too long for the field into which they are being written.
- It is possible to use all of the sampling and Where Clause options with user defined datasets - just as with the supplied datasets.
- It is possible to enable Randomize and Unique Values Only options on user defined datasets.
- The sample user defined dataset entitled Endangered_Mammals.udef is installed (by default) along with the standard datasets in the DataSets directory.