Redgate Test Data Manager

Using the worked examples to understand data generation

This section uses the output from the worked data generation examples to explain how the data generator produces referentially intact data. You may find it helpful to actually carry out the worked example steps before you read this page.

Contents

Understanding the data generator

Right now the only parameter that affects what data is generated is --rows-to-generate which determines the number of rows that get generated in each table. The worked examples all specify 1000 rows, so if you check the resulting tables you should find they all have exactly 1000 rows.

Referential Integrity

The worked examples all use a schema that looks like this:

So every row of Users  must reference a valid row of Orgs, every Post  must reference a valid User  and every Comment  must reference a Post and a User (The post that it's a comment on and the user who made the comment).

The data generator handles these constraints by starting at the top of the tree and generating Orgs  first. It then generates Users  and for each user selects a random Org  for them to belong to. The same Org  may be referenced by multiple users, and some Orgs  may have no user s at all.

The generator then generates all of the Posts by choosing a random User  for each row to reference in just the same way.

Finally it generates Comments  by choosing a random Post  and a random User  for each generated comment. The fact that each Post  already points to a User  has no bearing on this process, the User referenced by a Comment  is no more or less likely to be the one referenced by its Post  than any other User .

Generated Values

The data generator decides what value to generate for each column based only on its data type. For integer types it generates a random integer within the range, for string types it generates a random string with a length in the allowed range etc.

None of the columns in the worked examples have any check constraints on them. If they had then the data generator would've attempted to generate valid data that fits them but depending on how strict they are it might have failed to generate the requested number of rows.



Didn't find what you were looking for?