This document explains how RDM (Rok Disk Manager) manages and reserved the storage on the nodes, and how Rok uses the underlying disks for its internal processing.
Disk space reservation by Rok Disk Manager (RDM)
- RDM is deployed with a pre-configured script which describes how it uses the underlying disks.
- Following this script, it gathers all available disks (both persistent but slow disks, e.g., EBS, and ephemeral but fast disks, e.g., local NVMe) mounted on /dev/sd[f-p] excluding the root disk used by the instance.
- RDM then assembles a RAID0 device from all the discovered disks in previous step.
- RDM creates a VG (named
rokvg) on top of this RAID0 device
- RDM creates an LV that Rok uses as ephemeral (scratch) space when taking snapshots for this node. Rok uses this LV to temporarily copy the changed blocks while hashing and de-duplicates them before uploading them to S3. The ephemeral space that Rok reserves for snapshots is by default
min(200 * GiB, 0.3 * rokvg.size)per node.
- Rok uses the rest of the free space to create the volumes for the users.
Disk space reservation by Rok
- Every time rok takes a snapshot, it take a little bit of time to process and upload the new changed data to S3.
- In order to avoid avoid freezing the volume and not accept any new writes during the snapshot process, rok creates and uses an ephemeral LV of size 10GiB (for each volume being snapshotted). This volume is used to write live data that the user continue to write on their volume, while rok is processing the snapshots of the volume.
- NOTE: Currently if a volume takes up all of the available space, snapshot process won't be able to reserve space and will fail. We have an issue to fix this and reserve extra space for at least 1 snapshot, so that it is guaranteed that snapshot always succeed