How SQL Clone Works
Published 22 November 2019
SQL Clone makes it possible for many SQL Server instances to share read access to copy of a database's files, held in an image (a complete, point-in-time copy of those files). In effect, it "virtualizes" the data, from this point onward. It does this using only of standard disk virtualization technologies built into the Windows Operating System. It creates a Virtual Hard Disk (VHD) containing the image, and another one, called a differencing VHD, or just 'diff file', for each clone, which has a connection, via a VHD mount point, to the remote image.
SQL Clone makes no changes to the way this virtualization technology works; it merely 'orchestrates' it. Once it has assembled the various 'players' , SQL Clone steps back and lets them play, although it will try to step in and try to help fix things up, if they go awry.
In effect, SQL Clone exploits a similar idea to that used by long-established technologies like RAID, SAN and NAS, in that the simple "logical volume" presented to SQL Server hides from it an underlying disk architecture that is different, and often more complex. SQL Server 'sees' only files.
SQL Clone can use the VHDX format (.vhdx), the Hyper-V virtual hard disk format used in Windows 8 and in Windows Server 2012, and upwards, which support virtual hard disk sizes up to 64TB. If one of these newer Operating Systems is installed on the host machine, SQL Clone will detect it automatically, and it will enable users to create clones of databases up to 64TB. If not, it will default to the older VHD virtual hard disk format (.vhd), which has a 2TB limit.
Using SQL Clone is a two-step process. The first step is creating the image. This is a relatively slow process, roughly equivalent to restoring a database, but this step will be performed relatively infrequently and can be automated, for example so that it runs overnight. The second step is creating one or more clones, from an image. Creating each clone takes only seconds and can also be automated.
All this work is carried out by SQL Clone Agents. SQL Clone Server coordinates the Agents and does bookkeeping in its config database.
How Images Work
An image is a point-in-time 'snapshot' of your source SQL Server database, which can be created from database backup files (a single full database backup file, or to a set of files, for differential or striped backups), or from a live SQL Server database.
A SQL Clone Agent creates the image VHD, and then copies onto it the database's data files and transaction log. An image is immutable and should not be altered in any way, after it's created.
When creating the image, a SQL Clone Agent, performs the following tasks:
- Creates the image VHD, at the image location
- Creates a mount point to this VHD on the local operating system of the machine hosting the source database
- Captures a point-in-time copy of the data and log files into the image VHD from either:
- a live database connection – an Agent installed on the instance hosting the source database use the Windows Volume Shadow Copy Service to copy the files.
- database backup files (SQL Server backup or Redgate SQL Backup) – we must supply a 'temporary' SQL Server instance for the restore operation. SQL Clone restores directly into the mounted image VHD, so this temporary instance doesn't need any storage for the database.
- Un-mount the image VHD, isolating the image from the source database
SQL Clone will, optionally, run image modification scripts as part of the image creation process, such as to apply data masking sets to protect personal or sensitive data.
An image will always take the version of the SQL Server instance that is used to create it, and can only be used to create clones at that SQL Server version or higher. This is true even when creating an image from a backup of an earlier version of SQL Server. See the FAQs about Image and Clone SQL Server version compatibility for more detail.
How Clones Work
A clone is an lightweight copy of a database, created from an image. We can create many clones from a single image. We can also apply templates and run scripts to 'customize' each clone, as part of the the clone creation process.
For each clone, a SQL Clone Agent, installed on the server that will host the clone, creates a differencing VHD on the local server with a connection back to the image VHD from which it was created. This differencing VHD stores only the changes made to each clone.
When a SQL Server instance hosts a clone, the SQL Server engine sees a normal database, with local database files that it can read, and to which it can write. The underlying difference is that the clone's data has been virtualized. In other words, without SQL Server being aware of it, a clone's database files are not local; the database and its data (i.e. its data and log files) are held in the image VHD, in a shared folder on the network, and only the changes made to each clone are held locally, in a diff file, on the SQL Server instance hosting the clone. The size of this Diff file will be small, initially, but will grow in size with subsequent use, according to the number modifications made to the clone.
When you query a clone and SQL Server finds that the pages aren't in cache, it requests them from disk, just as normal. The VHD driver first checks to see if the requested pages are in the clone's (diff) VHD, and requests from the image VHD any that aren't there. In effect, the database is, invisibly to SQL Server, 'split' between the image and the clone files, and so there must be a constant connection between these files.
If the clone-image connection is broken the clone will stop working and SQL Server will report the operating system error 21 (the device is not ready), which of course it treats as a severe system-level error condition that threatens database integrity:
If the network connection has insufficient speed or bandwidth, a clone will simply be less responsive. For example, if you are working on a local clone that must connect to its image over a VPN, then queries will be slow:
You'd experience similar problems with any database forced to connect to its storage over a slow or unreliable network. If you can avoid these situations then, in all respects, a clone is exactly like a real database and will behave and perform like one.
How SQL Clone Agents Work
SQL Clone Agents are the workhorses of SQL Clone. Each SQL Server instance that will host one or more clones, or from which you wish to create images, needs to have a SQL Clone Agent service installed. On each of these machines, the Agent service will perform all the required SQL Clone tasks relating to images and clones, such as creating an image from a database or backup, creating all the required VHDs, writing the image file to the network share, creating and deleting clones, and so on.
As such, the service account designated to run a SQL Clone Agent requires "superuser" permissions. Specifically, it will need to be:
- A Windows local administrator on the machine – to perform OS-level tasks such as making all the required Virtual Disk Service calls, attaching virtual disks, accessing SQL Server database files, and so on. See Why does SQL Clone Agent need local Administrators privilege? for more details.
- A member of the sysadmin fixed server role on the local SQL Server instance – the agents need to perform administrative tasks on the instance, such as creating, restoring, attaching, modifying, detaching and deleting databases. This requires db_owner permissions on each database, and only the sysadmin role is automatically mapped to dbo user in every database. Other users, even with CONTROL SERVER, do not automatically get this access to databases they didn't create.