This section uses the output from the worked data generation examples to explain how the data generator produces referentially intact data. You may find it helpful to actually carry out the worked example steps before you read this page.
Contents
Understanding the data generator
Right now the only parameter that affects what data is generated is --rows-to-generate
which determines the number of rows that get generated in each table. The worked examples all specify 1000 rows, so if you check the resulting tables you should find they all have exactly 1000 rows.
Referential Integrity
The worked examples all use a schema that looks like this:

So every row of Users
must reference a valid row of Orgs
, every Post
must reference a valid User
and every Comment
must reference a Post
and a User
(The post that it's a comment on and the user who made the comment).
The data generator handles these constraints by starting at the top of the tree and generating Orgs
first. It then generates Users
and for each user selects a random Org
for them to belong to. The same Org
may be referenced by multiple users, and some Orgs
may have no user
s at all.
The generator then generates all of the Posts
by choosing a random User
for each row to reference in just the same way.
Finally it generates Comments
by choosing a random Post
and a random User
for each generated comment. The fact that each Post
already points to a User
has no bearing on this process, the User
referenced by a Comment
is no more or less likely to be the one referenced by its Post
than any other User
.
Generated Values
The data generator decides what value to generate for each column based only on its data type. For integer types it generates a random integer within the range, for string types it generates a random string with a length in the allowed range etc.
None of the columns in the worked examples have any check constraints on them. If they had then the data generator would've attempted to generate valid data that fits them but depending on how strict they are it might have failed to generate the requested number of rows.