FAQ
Published 31 March 2022
This page holds several common concerns and inquiries related to the deployment of Redgate Clone, sorted by category.
Data images and data containers
A lot of the information presented here in heavily dependent on the concepts of data image and data container, so please be sure to have a read on what is a data image? and what is a data container? beforehand.
Contents
Performance expectations
How long does it take to create a data image?
Because a data image is a full copy of a database instance, its creation process takes time and is directly proportional to the size of the database instance that we are dealing with (e.g. hours when dealing with backup sizes in order of TBs).
In other words, the bigger the source backup, the longer it will take to create.
Duration is also heavily affected by the database engine (e.g. SQL Server is typically slower than PostgreSQL for the same backup size) and in terms of infrastructure mainly by the I/O speed of your disk operations (i.e. the faster your hard drives, the quicker it will be).
Taking SQL Server as an example (and with very rough estimates), you should expect a few minutes for database instances of a few GBs, increasing to a couple of hours for hundreds of GBs, then a dozen or so hours when entering the TBs territory up to a day when dealing with hundreds of TBs. Obviously, as mentioned above, this is very much dependent on your infrastructure and the specs of your host Kubernetes cluster.
Backup compression is also likely to adversely affect performance of data image creation.
How long does it take to create a data container?
As shown in the description of a data container, creation time of a data container is a fixed-duration operation which lasts no more than a few seconds, being (almost) totally independent of the size of the original database (i.e. creating a data container from a 5TB data image has roughly the same duration as from a 128MB one).
Memory, storage and CPU considerations
Is Redgate Clone memory, CPU or I/O bound?
When setting up the infrastructure to hold a cluster hosting Redgate Clone as per our deployment instructions (see Installation steps), memory and data storage will be the main limiting factors for the overall stability and performance of the application.
Most database engines are rarely CPU dependent, as they are almost always hard drive I/O bound. Although the CPU can spike for short periods to 100%, high CPU is almost always a symptom of another issue, like high page swapping or I/O issues.
One other thing to note is that RAM is pre-allocated to our data containers on creation and most of the RAM usage in database engines is buffer cache.
You can get more details by visiting:
- Memory information - Please have a look at the rest of this FAQ and our database engine system requirements for an overview of the RAM requirements for data containers and data images.
- Storage and I/O information - Please have a look at the rest of this FAQ, the data storage page and the adding disks page for horizontal storage scaling (scale in).
How much RAM does a data image consume and for how long?
Data image creation involves a temporary database instance that can have slightly higher resource usage (mostly RAM) than the one used for data containers, because of a resource intensive backup restore process. This instance is cleared up at the end of the data image creation operation, so unlike for data containers memory usage is not persistent.
Memory usage is also dependent on the database engine being used.
In terms of values, these are slightly higher than for data containers (see How much RAM does a data container consume and for how long? below), but remember that this is a burst usage only during data image creation that gets freed immediately afterwards.
RAM usage | MSSQL | PostgreSQL | MySQL | Oracle | ||||
---|---|---|---|---|---|---|---|---|
Request | Limit | Request | Limit | Request | Limit | Request | Limit | |
In-Progress Data Image (freed after creation) | 512Mi | 3Gi | 256Mi | 1Gi | 256Mi | 1Gi | 512Mi | 3GI |
How much RAM does a data container consume and for how long?
As explained in the system requirements for our supported database engines, data containers have different memory requirements based on the database engine being used (see table below).
Please note that these are fixed allocations memory wise regardless of whether the data container is being used or not. Memory usage is also constant throughout the whole lifetime of a data container as we have the same memory request and limit as seen on the table below (to avoid dynamic rescaling during execution which could impact performance).
In other words, as soon as a data container is created, it will consume its entire memory allocation and this will persist until the data container is deleted. This differs from data images (see How much RAM does a data image consume and for how long?).
RAM usage | MSSQL | PostgreSQL | MySQL | Oracle | ||||
---|---|---|---|---|---|---|---|---|
Request | Limit | Request | Limit | Request | Limit | Request | Limit | |
Data Container (entire lifetime) | 512Mi | 3Gi | 256Mi | 1Gi | 256Mi | 1Gi | 512Mi | 3GI |
How much space does a data image take up on disk, given the size of the database backup?
A data image is a complete, point-in-time copy of the database file of a given database backup. Because of this, if we are dealing with uncompressed backup sizes, the disk usage relationship is roughly 1:1. This means that the space taken by a new data image is approximately the same as the size of the original database backup used as the data image source, plus a small chunk of overhead of a few megabytes.
This outcome is independent of the database engine being used.
In other words, if you have a 400GB database backup, you should be looking at less than 401GB hard drive usage for a data image. Data image space usage is spread across all existing disks, but it's only evenly distributed if you attach ALL disks to the primary node as show in adding disks page.
We haven't measured the impact of compressed database backups yet, but you should expect the size to grow substantially as we store the uncompressed contents when creating a data image during the restore process. Thus the compressed backup is expanded in the restore process and the storage outcome is heavily dependent on the quality of the compression being applied.
Unlike for data containers and independently of whether it comes from a compressed or uncompressed backup, data image disk usage is a fixed storage allocation and will not grow past the original size on disk, regardless of its age or the number of data containers you create from it.
All used space will be freed once you delete the data image, but allow a few minutes for this to happen as storage cleanup is not guaranteed to be instantaneous.
It's recommended you only keep data images around for as long as you need them as they tend to consume quite a lot of space otherwise.
How much space does a data container take on disk?
The initial size of a data container is mostly independent from the size of the parent data image, although there is a slight non-linear impact at play.
The used database engine can also marginally affect the space usage of a data container.
For example, the creation of a SQL Server Data container should be taking on average around 230-250MB, but this can fluctuate anywhere between 130MB-300MB. Expect a bit less usage for PostgreSQL and MySQL data containers.
As a rule of thumb, for a given database engine, you should be looking at a few hundred megabytes at most for data container storage consumption when initially created. Data container space usage is spread across all existing disks, but it's only evenly distributed if you attach ALL disks to the primary node as show in adding disks page.
What about after creation? Well, that really depends on usage as mentioned in the description of a data container. The more you modify the database instance contents of a data container as you use it compared with the parent data image, the more space it will end up using.
It's recommended that data containers are short lived - ideally with a lifetime of less than 1 week - if only because they will keep on growing. Furthermore, it's very easy and quick to create new ones when needed (see How long does it take to create a data container?).
All used space will be freed once you delete the data container, but allow a few minutes for this to happen as storage cleanup is not guaranteed to be instantaneous.
Scaling (horizontal and vertical)
How many data containers can I have running at the same time?
As shown in How much space does a data container take on disk? and How much RAM does a data container consume and for how long?, memory and data storage are the main limiting factors when it comes to the max number of concurrent data containers (CPU less so).
When it comes to memory, you also have to take in account the type of database engine being used as the allocations differ.
As for storage, while the hard drive space used on creation by the data containers themselves is small (a few hundred MBs), you need to remember that the parent data image could use substantial disk space on its own and that data containers will grow depending on the usage being made (see links above for more details on these topics).
Because of this, it's hard to make a completely accurate prediction as there are many variables at play.
If you take our minimum specifications (e.g. Step 1 - Installation requirements) for a single node cluster with 32GB RAM and 1TB disk and assuming that you are using relatively small data images of less than 100GB, then you should be looking at being able to create 13-15 concurrent data containers, but more realistically around 10 if you also plan for continuous usage of those database instances by your team or CI/CD pipelines.
In this scenario, disk usage is less relevant than memory (due to the relative small size of the backups). So, bumping the RAM to our recommended 64GB, gets you double those figures.
See How can I increase the max number of concurrent running data containers? for details on how you can increase these numbers depending on your use case.
How can I increase the max number of concurrent running data containers?
As shown in How much space does a data container take on disk? and How much RAM does a data container consume and for how long?, memory and data storage are the main limiting factors when it comes to the max number of concurrent data containers (CPU less so).
You have a few ways to update each of these to increase the capacity of concurrent data containers:
- Memory - You can increase the RAM (vertical scaling)
- Storage - You can increase the size of your existing disks (vertical scaling) or add more disks.
Which storage scaling is best – vertical or horizontal?
To achieve the best performance, each disk is set to use 4GB of RAM. This is regardless of its capacity.
For example: attaching 2x1TB + 2x500GB disks will consume 16GB RAM (4x4GB) just for the storage. Exactly the same amount of RAM will be consumed by attaching 4x4TB disks.
Thus, we suggest considering which scaling type fits your situation the best. Here are the pros and cons of each:
Scaling Type | Advantages | Disadvantages |
---|---|---|
Vertical (increase size of the disk) |
|
|
Horizontal (add more disks) |
|
|