Cloning a VM containing a SQL Clone agent causes strange behaviour
Published 14 September 2018
SQL Clone distinguishes between its agents using a client certificate installed into the machine by the agent service. The agent service generates this certificate when it is installed.
As a result, duplicating a virtual machine that already has a SQL Clone agent installed (and therefore a certificate too), can cause issues:
- Commands intended for one machine may be dispatched to another, where they will likely fail.
- The duplicated agents will constantly reestablish their connection with the server, attempting to constantly reclaim 'their spot', and tying up network resources.
- There may be strange UI issues, such as seeing metadata associated with an agent (like its service username) flicker between different information.
We expect to improve behaviour around this problem, but currently there is little in-application handling for this situation. As such, it is recommended to install an agent after cloning the VM.
If you're experiencing this issue, this page will walk you through resolving it. It's a matter of identifying the affected machines, removing Clone agents from them, and doing some manual cleanup. If there are still issues afterwards, please contact support.
Identify machines sharing a SQL Clone Agent certificate
On each agent machine,
mmc certlm.mscto open the local computer certificate manager.
- Under Personal/Certificates, find the SQL Clone Agent certificate.
- Double-click it, navigate to Details, and record its Thumbprint.
- Make a note of any agent machines that share a thumbprint.
If you don't find any machines sharing a thumbprint, then you are not experiencing the technical issue resolved by this page, and should contact support instead of proceeding.
Fixing the issue
On each machine sharing a thumbprint, shut down the agent service via services.msc.
Decide which machine is the original. This one machine will be able to keep its clones. If you don't know which is the original, assume none of them are, and proceed.
You can either destroy any non-original machines if they're not a pain to reprovision, or do the following to remove their agents and clean them up (this process doesn't need to be done for the original machine):
- Uninstall the agent service.
- Drop any clone databases via SSMS or equivalent.
- Use Disk Management to detach the clone VHDs (the partition will be named 'Database VHD', and the filename in the detachment dialog will end in 'disk.vhd'). Make note of each detached disk.vhd file.
- Delete each noted disk.vhd file from your file system.
- Delete the SQL Clone Agent certificate found earlier.
Delete errant source locations from the SQL Clone config DB:
- Find all source locations for the agent (including legitimate!) with the following (replacing thumbprint with the duplicated thumbprint from earlier):
- SELECT * FROM dbo.SourceLocations WHERE DeletedOn IS NULL AND EXISTS (SELECT * FROM dbo.Agents WHERE Thumbprint LIKE N'DBA7C824A07D990CCB703E47A5C2AF4DA19B7D3A' AND DeletedOn IS NULL);
- Delete any rows that don't match an instance on the original machine - i.e. they were entirely unique to one of the non-original machines.
Start up the original agent service. It will handle deleting references to deleted clones.
Reinstall agents on other machines using a fresh installer for each.