Published 15 September 2016
The topic of creating environments populated with real-world data is complex and goes beyond the scope of a change management tool like ReadyRoll. Every organization is different, so the approach you take to establishing test environments that are faithful representations of your production environment must be tailored to the individual needs of both your team and application system.
However, given that ReadyRoll works best when each member of your team has a private copy of the database to work with (either on their own machine or on a remote server), it is worth enumerating some of the different methods of populating your environments that are available. ReadyRoll ships with a number of features that can help support your efforts to orchestrate the creation of your database environments; these are mentioned alongside each of the different data population strategies outlined in this article.
Shared vs Dedicated environments?
Having dedicated developer “sandboxes” provides the isolation needed to ensure that he/she can design and test their schema changes without affecting other developers. Providing everyone in your team with their own copy of the database also gives developers the ability to work on separate branches of code at the same time, facilitating concurrent streams of development work.
Which strategy for creating environments with real-world data you choose largely depends on the time and resources available to you, as some methods are easier to implement than others.
Strategy A: Create and populate your databases from scratch using project sources
This approach requires that all of the T-SQL logic needed to reproduce your database be stored within your ReadyRoll project. This includes all of the code needed to create the database, logins, users, schema objects, programmable objects, as well as logic to seed your reference tables with static data. ReadyRoll can help with scripting all of these types of objects, as described in our article on Deploying to an existing database.
If having transactional data within your test databases is also desired (such as the population of sample customers, orders, etc) then additional work to incorporate this aspect of your "sandbox" environment deployment will need to be performed. These custom scripts can be included right alongside the scripts that ReadyRoll generates for you: since everything in ReadyRoll is deployed via a migration script, orchestrating both schema and data deployment is as simple as editing a SQL file.
The advantage of this approach is that, after the initial ground work within your project is done, every developer will be able to create their entire database environment by simply deploying the Visual Studio solution. The draw-back is that it may take some time to script out and maintain the sample data within the project scripts.
Seeding (smaller) tables with INSERT statements
The ReadyRoll tool-window in Visual Studio is able to script out INSERTS for your tables, in addition to schema. This can be done as a one-off operation by switching on data tracking for a given table temporarily, importing the data into the project, and then switching off the table-level tracking. The last step is needed because it may not be desirable for future UPDATEs to the data to be automatically scripted into the project. Note that, as this will result in potentially thousands of insert statements being added to the project, which could hamper deployment performance, this feature is only suitable for smaller datasets. See this article for more information on the data tracking feature.
Seeding (larger) tables with BULK INSERT
A common approach to populating transactional tables is with the SQL Server BULK INSERT statement. ReadyRoll can assist with including the input file for the statement (e.g. a CSV file containing sample customer data) within your migration scripts. For more information on how to include an external seed file as part of your deployment, see this article on using BULK INSERT with ReadyRoll.
Controlling data deployment with SQLCMD variables
If you would like some tables to be populated in some environments and not others, you can define SQLCMD variables within the project and consume them in your migrations (e.g. $(MyVariable)) to control data deployment. For example, if you want your BULK INSERT to only run in test environments, you could do so by adding and using a variable called $(SeedWithCustomerData) within the project. This variable could be used to wrap the statement with guard clauses to ensure that the table is only populated when SeedWithCustomerData=True:
IF ‘$(SeedWithCustomerData)’ = ‘True’ THEN BEGIN BULK INSERT [dbo].[Customers] FROM '$(DeployPath)CustomerData.csv' WITH (FIELDTERMINATOR = ',', FIRSTROW = 2) END
Setting the value of the variable
The default value of the $(SeedWithCustomerData) variable is defined within your project file, so everyone who deploys the database project on their machine will use that value (unless they override this with a different value in the "local" column). However when you deploy outside of Visual Studio, to control whether the table is populated, you can set the variable in your deployment tool: if deploying using Octopus Deploy, simply add a variable to your Octopus project called "SeedWithCustomerData" and specify True or False. If deploying directly from PowerShell, simply set the $SeedWithCustomerData variable prior to executing the deployment.
Strategy B: Create your databases from scratch using project sources, then use a custom tool to populate the database
This approach is similar to Strategy A in that you would use the ReadyRoll project to deploy your schema, code and static data. However instead of scripting the transactional data, you might elect to use a third party tool to accomplish this, such as SQL Data Generator. Some of our customers have even written their own ETL process to import subsets of transactional data into their sandbox environments.
The advantage of this approach is that it simplifies your database project by keeping the maintenance of your transactional test data separate to that of your change authoring process. Drawbacks include having to establish and maintain the data population process, as well as the complexity of having to run an additional step after performing the initial schema deployment.
Strategy C: Restore a database backup onto your test servers
Instead of trying to reproduce your database from source control, setting up your development environment by restoring from backup files is another option. The difference with this approach to strategy A & B is that, instead of creating your database objects from your database projects in source control, they will be created from your backup and the scripts that create these objects will be skipped; only the scripts that have yet to be deployed to the database will be executed (if any). The advantage of this approach is that your development environments will be created with realistic test data, however if the database contains sensitive data, this may complicate the provisioning of dedicated environments as a data masking/sanitisation step would need to be performed before the backups could be used by developers.
The other drawback of this approach is that the cumulative size of all these dedicated database environments may be prohibitive for your local or network storage infrastructure.
Provision your database environment using SQL Clone
Instead of using backups, another option for provisioning dedicated development environments is to use SQL Clone, which lets you quickly create full copies of SQL Server databases and backups. When you subsequently deploy your database project, ReadyRoll will figure out which changes have yet to be applied, and execute only those migration scripts which are pending.
The advantage of this approach is that your environment will come pre-loaded with real world test cases. Additionally, having the ability to quickly create copies of the database simplifies the process of rolling-back unwanted changes, as you can always create a new copy and apply any pending migrations.
A risk with this approach to be aware of: it is possible that unintended changes may be imported into migrations (and potentially deployed to other environments) if the restored database does not match the schema contained within the ReadyRoll project. To mitigate this, we recommend that databases backups taken from canonical sources only (e.g. a Production environment) and that any drift is removed from the database prior to authoring new changes (see ReSync your schema within this article).