Data Population

The topic of creating environments populated with real-world data is complex and goes beyond the scope of a change management tool like SQL Change Automation. Every organization is different, so the approach you take to establishing test environments that are faithful representations of your production environment must be tailored to the individual needs of both your team and application system.

However, given that SQL Change Automation works best when each member of your team has a private copy of the database to work with (either on their own machine or on a remote server), it is worth enumerating some of the different methods of populating your environments that are available. SQL Change Automation ships with a number of features that can help support your efforts to orchestrate the creation of your database environments.

Shared vs Dedicated environments?

Having dedicated developer “sandboxes” provides the isolation needed to ensure that he/she can design and test their schema changes without affecting other developers. Providing everyone in your team with their own copy of the database also gives developers the ability to work on separate branches of code at the same time, facilitating concurrent streams of development work.

Which strategy for creating environments with real-world data you choose largely depends on the time and resources available to you, as some methods are easier to implement than others.

Strategy A: Create and populate your databases from scratch using project sources

This approach requires that all of the T-SQL logic needed to reproduce your database be stored within your SQL Change Automation project. This includes all of the code needed to create the database, logins, users, schema objects, programmable objects, as well as logic to seed your reference tables with static data. SQL Change Automation can help with scripting all of these types of objects, as described in our article on Working with an existing database.

If having transactional data within your test databases is also desired (such as the population of sample customers, orders, etc) then additional work to incorporate this aspect of your "sandbox" environment deployment will need to be performed. These custom scripts can be included right alongside the scripts that SQL Change Automation generates for you: since everything in SQL Change Automation is deployed via a migration script, orchestrating both schema and data deployment is as simple as editing a SQL file.

The advantage of this approach is that, after the initial ground work within your project is done, every developer will be able to create their entire database environment by simply deploying the Visual Studio solution. The draw-back is that it may take some time to script out and maintain the sample data within the project scripts.

Seeding (smaller) tables with INSERT statements

The SQL Change Automation tool-window in Visual Studio is able to script out INSERTS for your tables, in addition to schema changes. This can be done as a one-off operation by switching on data tracking for a given table temporarily, importing the data into the project, and then switching off the table-level tracking. The last step is needed because it may not be desirable for future UPDATEs to the data to be automatically scripted into the project. Note that, as this will result in potentially thousands of insert statements being added to the project, which could hamper deployment performance, this feature is only suitable for smaller datasets. See Static Data.

Seeding (larger) tables with BULK INSERT

A common approach to populating transactional tables is with the SQL Server BULK INSERT statement. SQL Change Automation can assist with including the input file for the statement (e.g. a CSV file containing sample customer data) within your migration scripts. For more information on how to include an external seed file as part of your deployment, see Static Data.

Controlling data deployment with SQLCMD variables

If you would like some tables to be populated in some environments and not others, you can define SQLCMD variables within the project and consume them in your migrations (e.g. $(MyVariable)) to control data deployment. For example, if you want your BULK INSERT to only run in test environments, you could do so by adding and using a variable called $(SeedWithCustomerData) within the project. This variable could be used to wrap the statement with guard clauses to ensure that the table is only populated when SeedWithCustomerData=True:

IF ‘$(SeedWithCustomerData)’ = ‘True’ THEN
BEGIN
   BULK INSERT [dbo].[Customers]
   FROM '$(DeployPath)CustomerData.csv'
   WITH (FIELDTERMINATOR = ',', FIRSTROW = 2)
END

Setting the value of the variable

The default value of the $(SeedWithCustomerData) variable is defined within your project file, so everyone who deploys the project on their machine will use that value (unless they override this with a different value in the "local" column). However when you deploy outside of Visual Studio, to control whether the table is populated, you can set the variable in your deployment tool: if deploying directly from PowerShell, simply set the $SeedWithCustomerData variable prior to executing the deployment.

Strategy B: Create your databases from scratch using project sources, then use a custom tool to populate the database

This approach is similar to Strategy A in that you would use the SQL Change Automation project to deploy your schema, code and static data. However instead of scripting the transactional data, you might elect to use a third party tool to accomplish this, such as SQL Data Generator. Some of our customers have even written their own ETL process to import subsets of transactional data into their sandbox environments.

The advantage of this approach is that it simplifies your project by keeping the maintenance of your transactional test data separate to that of your change authoring process. Drawbacks include having to establish and maintain the data population process, as well as the complexity of having to run an additional step after performing the initial schema deployment.

Strategy C: Restore a database backup onto your test servers

Instead of trying to reproduce your database from source control, setting up your development environment by restoring from backup files is another option. The difference with this approach to strategy A & B is that, instead of creating your database objects from your projects in source control, they will be created from your backup and the scripts that create these objects will be skipped; only the scripts that have yet to be deployed to the database will be executed (if any). The advantage of this approach is that your development environments will be created with realistic test data, however if the database contains sensitive data, this may complicate the provisioning of dedicated environments as a data masking/sanitisation step would need to be performed before the backups could be used by developers.

The other drawback of this approach is that the cumulative size of all these dedicated database environments may be prohibitive for your local or network storage infrastructure.

A risk with this approach to be aware of: it is possible that unintended changes may be imported into migrations (and potentially deployed to other environments) if the restored database does not match the schema contained within the SQL Change Automation project. To mitigate this, we recommend that databases backups taken from canonical sources only (e.g. a Production environment) and that any drift is removed from the database prior to authoring new changes (see Revert pending changes to re-sync your schema).

Strategy D: Provision your database environment using SQL Clone


Instead of using backups, another option for provisioning dedicated development environments is to use SQL Clone, which lets you quickly create full copies of SQL Server databases and backups. When you subsequently deploy your project, SQL Change Automation will figure out which changes have yet to be applied, and execute only those migration scripts which are pending.

The advantage of this approach is that your environment will come pre-loaded with real world test cases. Additionally, having the ability to quickly create copies of the database simplifies the process of rolling-back unwanted changes, as you can always create a new copy and apply any pending migrations.


Didn't find what you were looking for?