Use a SQL Server backup to baseline a database
Published 19 December 2024
Introduction
Backups as baselines offers an alternative way provision a database environment. The standard Flyway workflow is to create a baseline script from a production environment, which can then be used to provision development or shadow environments. However, baseline scripts for large databases can be millions of lines long and difficult to debug if something goes wrong. Using a database backup avoids the need to generate a baseline script and hands the complexity over to the database engine to manage.
In this document we will outline the steps to create a schema-only backup and use it to provision a shadow environment.
Note: An enterprise license is required to use backups as baselines.
Taking a backup
The first step is to take a schema-only backup of the production database, which in this walkthrough will be called AdventureWorks2022
. In order to achieve this, we must first create a schema-only copy of our database, which can be done using the DBCC CLONEDATABASE command:
DBCC CLONEDATABASE(AdventureWorks2022, AdventureWorks2022_Copy) with VERIFY_CLONEDB;
After running the statement above, we will have another database alongside AdventureWorks2022
called AdventureWorks2022_Copy
. The new AdventureWorks2022_Copy
database will have the same objects (e.g. schemas, tables, procedures, ...) as the AdventureWorks2022
database, but without any of the data. The schema-only backup can now be created using the AdventureWorks2022_Copy
database with the BACKUP command:
BACKUP DATABASE [AdventureWorks2022_Copy] TO DISK = N'AdventureWorks2022_Copy.bak';
A AdventureWorks2022_Copy.bak
file will be created in SQL Server's default backup directory. You can optionally specify the full file path for the backup file, please see Backup Devices for the full list of options.
With the backup taken, the AdventureWorks2022_Copy
database can now be deleted, as it's no longer needed. This can be done with the standard DROP DATABASE command, as shown below:
DROP DATABASE [AdventureWorks2022_Copy];
Configuring Flyway's Backup provisioner
The next step is to configure the Backup Provisioner to use the AdventureWorks2022_Copy.bak
backup file to provision the shadow database. By provision, we mean that Flyway will restore the backup to the shadow database when necessary.
One important aspect to consider is the location of the backup file. If the backup will be restored onto the same SQL Server instance from where it was taken, then the backup file doesn't need to be moved, as it will be accessible to the SQL Server process. If the shadow environment will be on a different SQL Server instance then we must ensure the backup file is accessible to that instance. For example, it may be necessary to:
- Copy the backup file to the machine running the SQL Server instance which will host the shadow database.
- Expose the backup file on a network share accessible to the machine running the SQL Server instance which will host the shadow database.
In this example, we will assume the backup file will be restored onto the same SQL Server instance and so won't need to be moved. We can then define our shadow environment in Flyway's TOML file, and specify that it should use the backup provisioner:
[environments.shadow] url = "jdbc:sqlserver://localhost:1433;databaseName=AdventureWorks2022_Shadow;trustServerCertificate=true" user = "sa" password = "..." provisioner = "backup" [environments.shadow.resolvers.backup] backupFilePath = "AdventureWorks2022_Copy.bak" backupVersion = "001" sqlserver.generateWithMove = true
The [environments.shadow]
section defines an environment, which represents a database and the details needed to connect to that database. See Environments settings for more details on how to setup an environment. The provisioner = "backup"
parameter specifies that the shadow environment should use the backup provisioner.
The [environments.shadow.resolvers.backup]
section configures the backup provisioner for the shadow environment. Below we take a look at each of the fields defined above:
backupFilePath
- The file path to the backup file relative to the SQL Server instance where the backup will be restored. In our example, theAdventureWorks2022_Copy.bak
backup file is stored in SQL Server's default backup location, and as we're restoring the backup to the same SQL Server instance we can simply specify the backup filename. We could alternatively have specified the full file path:C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\Backup\AdventureWorks2022_Copy.bak
.backupVersion
- The version of the database that the backup represents. When creating a baseline for a new Flyway project it's standard to use version 001, as the backup represents the initial state of the database before any subsequent migrations have been applied.sqlserver.generateWithMove
- Backup files have data and log file paths embedded within them and the restore process will attempt to re-use those file paths. However, this can lead to issues when the file path formats contained within the backup don't match the SQL Server instance where the backup will be restored. For example, a backup file could contain Unix file paths and attempting to restore this on a SQL Server instance running on Windows will fail by default, or the embedded backup file paths may reference a non-existent drive on a different machine running SQL Server. Therefore, whensqlserver.generateWithMove=true
is set, the backup provisioner will generate unique file paths for the database being restored, by using the default locations for data and log files on the SQL Server instance restoring the backup. This ensures the restore won't fail due to invalid embedded file paths in the backup file.
Note as well that databaseName=AdventureWorks2022_Shadow
in the URL for the shadow environment. AdventureWorks2022_Shadow
is the name of the shadow database, and the database we wish to restore the backup to. This database doesn't have to exist at this point, as the backup provisioner will create it if necessary (assuming the user configured has permission to do so).
With the configuration above we can now trigger the creation of the AdventureWorks2022_Shadow
database and the backup to be restored to that database. Any interaction with the database through flyway should trigger the backup to be restored (if necessary). For example, the following info command triggers the backup to be restored and we see that our shadow environment is now baselined to version 001:
$ flyway info -environment=shadow ... Creating Schema History table [AdventureWorks2022_Shadow].[dbo].[flyway_schema_history] with baseline ... Successfully baselined schema with version: 001 ... Schema version: 001 +----------+---------+----------------------+----------+---------------------+----------+----------+ | Category | Version | Description | Type | Installed On | State | Undoable | +----------+---------+----------------------+----------+---------------------+----------+----------+ | | 001 | Baseline from backup | BASELINE | 2024-12-19 16:00:55 | Baseline | No | +----------+---------+----------------------+----------+---------------------+----------+----------+
Running the info command again would not trigger the backup to be restored a second time, as Flyway would identify that it wasn't necessary. However, if the shadow database was Cleaned, then the backup would be restored when Flyway next interacted with the shadow database.
Using Flyway Desktop
It's also possible to configure the backup provisioner through Flyway Desktop. This feature is currently in preview and requires the following environment variable to be set in order to expose the backup provisioner UI: FLYWAY_DESKTOP_CONNECTION_DIALOG_PROVISIONERS=true
.
The screenshot below shows how the TOML configuration of the previous section is displayed in the UI: