Redgate Clone

Data storage issues

This page lists known problems, possible workarounds and debugging techniques affecting our Data storage.

Contents

MODIFY FILE encountered operating system error 31

The most common reason why the data image creation would fail with the above error message is the lack of the attached storage disk. 

To understand the state & usage of the attached disk, we recommend checking-out Disk Usage (storage) dashboard. 

Unable to recover after host machine was rebooted/restarted

In some scenarios a planned or unattended reboot of the primary host machine can lead to issues. 

Unable to create new data-containers, or use the existing once, due to deployment/rook-ceph-osd-0 being unavailable

Problem

The most common issue with reboots is related to the Linux Kernel itself and the random reallocation of attached disks. 

In short, the Linux Kernel does not ensure that the disks would get attached to exactly the same device name (e.g. sdb) & number as before after the reboot. If you wish to read more in-depth explanation, see Troubleshoot Linux VM device name changes.

Example

Let's say you have the Ubuntu Virtual Machine with 6 disks (1 OS disk, and 5 x 256GB disks attached).

You can SSH into the host machine and find the names of the disks by executing lsblk -e7 command, which would output something similar to (sda is the system disk): 

Then, in the Storage Settings (Embedded) under the Name pattern you would type in sd[b-z] which would pick up all the disks from sdb - sdz device names inclusive (see example below).

That means the OS disk (sda) would be ignored, and not used by Kubernetes cluster.

image2022-3-22_17-22-51.png

However, after you reboot the system, the disks are not guaranteed to have exactly the same device name as only disk uuids are guaranteed (not yet supported in Redgate Clone) to remain the same after a restart.

Before RebootAfter Reboot

sda → 128GB OS Disk

sdb → 256GB attached disk

sdc → 256GB attached disk

sdd → 256GB attached disk

sde → 256GB attached disk

sdf → 256GB attached disk

sda → 256GB attached disk

sdb → 256GB attached disk

sdc → 256GB attached disk

sdd → 256GB attached disk

sde → 256GB attached disk

sdf → 128GB OS Disk

As a result, this means that the Storage Settings (Embedded) configuration you used before ( sd[b-z] ) is no longer valid, because it would pick up the sdf  disk, which is the OS disk, and wouldn't pick up sda , which is one of the extra attached disks (which may have existing data image and data container data).

The end result is a confusion in our storage solution that leads to an invalid product state (that would then require a hack such as manual tweaking to the Storage Settings (Embedded) to pick up the new mappings – see workarounds below for more details).

Workarounds

Linux Virtual Machines in Azure

If you are using Linux Virtual Machines in Azure, then we highly recommend using LUNS, instead of disk names. Please follow official Azure guide on how to identify the symbolic links (LUNS) and then apply them in the Storage Settings (Embedded)

Otherwise

For any other case, please reboot your cluster until you get into the right (initial) state or update the Storage Settings (Embedded) to match the new mappings (and don't forget to deploy the new configuration update afterwards in the dashboard's UI).


Didn't find what you were looking for?