Rok has several Kubernetes and Linux kernel components required for stable functionality. This KB article goes through each component's roles and responsibilities within a Rok cluster. This article also describes the deployment methodologies of each component. Manifests are often deployed via GitOps and are stored across the Arrikto public GitHub repository, an internal corporate repository, and the local repository on the Arrikto management environment(s). We also leverage rok-tools to deploy manifests. For a more detailed description of the Enterprise Kubeflow (EKF) installation requirements, check out the Rok requirements article.
If you are looking for a more in depth component guide please visit our component overview page.
Lastly, if you are looking for a deployment guide visit the install guide.
rok-disk-manager
Purpose: To prepare and manage disks for Rok.
Deployment Method: rok-deploy(DaemonSet)
Requirements: rok-tools command line tools associated with Rok ready environments
Description: Created by rok-deploy from within a rok-tools management environment , the Rok-disk-manager DaemonSet discovers and formats the local node disks in order to carve out volume groups over the physical disks. The RDM(rok device manager) discovers and formats the underlying local NVME disks so that they are Rok ready. Users cannot modify the disks themselves, but If an administrator decides to add a new disk to a node, then the RDM will manage it so Rok can utilize the additional capacity.
rok-kmod(think kernel module)
Purpose: To ensure that the kernel modules required by Rok are present, loaded, and enabled.
Deployment Method: rok-deploy(DaemonSet)
Requirements: rok-deploy, from within a rok-tools management environment
Description: Created via an install command rok-tools apply, the rok-kmod DaemonSet runs in watch mode to continuously confirm that the essential kernel modules required by Rok are loaded, enabled, and happy. The rok-kmod DaemonSet patches kernels shipped by our cloud vendor partners if/when they do not have the appropriate kernel modules enabled. The rok-kmod image is precompiled in order to support 10-20 kernels with the EXACT same settings expected by the cloud services with the necessary Arrikto essential modules enabled. We do not touch the host file system. We modify the kernel by looking at the kernel version to identify what is missing and make only in memory modifications.
rok-operator
Purpose: Manages the (primary) RokCluster custom resource and secondary resources.
Deployment Method: rok-deploy(StatefulSet)
Requirements: The rok-disk-manager, rok-kmod, and management environement
Description: A single StatefulSet operator that acts as the primary vehicle for the rest of the Rok installation. Responsible for deploying the whole rok stack.
Manages the (primary) RokCluster custom resource and manages the following (secondary) Rok resources:
- rok-csi-controller
- rok-csi-node
- rok-*
- rok-csi-guard-*
The rok-operator watches and manages the life cycle of rok-cluster instances. The rok-operator is in charge of making sure all the Kubernetes controllers have what they need to deploy the Rok components. Operators are complementary to the GitOps declarative mindset and core to many popular cloud native projects. For more information, please refer to the Cloud Native Compute Foundation(CNCF) white paper, Arrikto Gitops, and why the Kubeflow community is pivoting from Helm.
rok-csi-controller
Purpose: To be the Interface between K8s and the Rok ecosystem pods.
Deployment Method: The rok-operator
Requirements: rok-operator
Description: Watches for requests for new volumes via the container storage interface (CSI) via persistent volume claim(PVC). This is typical CSI functionality. Once a volume is requested, the rok-csi-controller instructs a rok-csi-node to prepare a persistent volume based on the volume group managed by Rok. The rok-csi-controller also handles all CSI requests including snapshots and persistent volumes. This functionality is core to being able to leverage stateful deployments via StatefulSets since each StatefulSet has their very own PVC and therefore requests a unique PV with specific access modes. It is also worth noting that according to the official documentation, "API Objects VolumeSnapshot
, VolumeSnapshotContent
, and VolumeSnapshotClass
are CRDs, not part of the core API. The CRDs and snapshot controller installations are the responsibility of the Kubernetes distribution." Basically, the functionality that the rok-csi-controller enables is far beyond the expectations of a standard CSI driver.
rok-csi-node(K8s focused)
Purpose: To field all the rok-csi-controller related requests and manages the rok-cluster behavior across nodes that are hosting Rok.
Deployment Method: rok-operator(DaemonSet)
Requirements: The rok-operator, rok-etcd, rok-csi-controller, object store target.
Description: Deployed by the rok-operator, the rok-csi-node DaemonSet Listens to the K8 requests delegated by rok-csi-controller and acts upon request for the hosting node, which includes creating a PVC, PV, Snapshot etc
rok-*(Rok focused)
Purpose: Responsible for all the snapshot logic of the Rok ecosystem.
Deployment Method: rok-operator(DaemonSet)
Requirements: The rok-operator, rok-etcd, rok-csi-controller, object store target.
Description: Deployed by the rok-operator, the rok-* DaemonSet manages the rok-cluster behavior across nodes that are hosting Rok. The rok-* is responsible for all the snapshot logic of the Rok ecosystem. The rok-* snapshots the management code and is responsible for change block tracking(CBT), hashing, de-duplication, and writing of all snapshot data. Snapshot data is written to a valid cloud object store by using a Kubernetes service account associated with an IAM role to access the cloud storage. Object stores are further extended by Rok's capabilities to share missing blocks across Rok registries upon request. The rok-* node keeps the state of the snapshot including the versioning and indexing of the data in rok-etcd.
Additional Functionality: The rok-* / composer is part of the rok-* pod and interprets everything in cloud storage, writes it back to disk and , (in collaboration with rok-etcd) manages the distribution of the data across the Rok Cluster. Systemd spawns the Rok composer as a service inside the Rok-* DaemonSet Pod.
rok-csi-guard-<NODE_NAME>
Purpose: To prevent node auto-scaling/deletion when active data is present on the node.
Deployment Method: rok-operator (Deployment on nodes with Rok data)
Requirements: The rok-operator
Description: The rok-csi-guard leverages a pod-disruption-budget(PDB) to prevent the auto-scaler from removing the node when unsaved data (data not yet saved to an object store) is present. The rok-operator removes the PDB once it validates that the data is already backed up to the object store. Once the PDB is removed, the auto-scaler can continue normal operations and remove the node based on demand. It is worth noting we highly recommend scale in protection. Please refer to our docs to confirm you are following best practices.
rok-etcd
Purpose: To host the metadata for pod snapshots, data storage and S3 indexing..
Deployment Method: Created by rok-deploy, from within a rok-tools management environment(StatefulSet)
Requirements: A highly available volume for the etcd StatefulSet.
Description: Created via rok-deploy when Rok is installed, the rok-etcd hosts the metadata for pods snapshots, data storage and indexing in S3, as well as how to read the S3 data, snapshot versioning, volume association, logical volumes and their physical location. Association in Kubernetes is done through a UUID based on the Kubernetes namespace. This WILL FAIL if the namespace is deleted. The rok-etcd must be backed up by a cloud volume(or similar technology) for high availability purposes. It is VERY important to make sure cloud backup policies are set for the cloud volumes that back etcd. Just like the primary Kubernetes etcd, we do not want to lose desired state or important cluster information.
WARNING! If the cloud volumes that Rok uses are lost, data on Rok volumes or Rok snapshots will be permanently lost and you will not be able to recover them. You should never delete detached cloud volumes that belong to an active Kubernetes cluster. PLEASE read the official documentation on how to protect your environment.
rok-redis
Purpose: To garbage collect within the Rok Cluster
Deployment Method: Created by rok-deploy, from within a rok-tools management environment
Requirements: rok-tools command line tools associated with Rok ready environments.
Description: Created by the rok-deploy command, the rok-redis deployment is used for garbage collection within the Rok cluster. The rok-redis deployment maintains references until they are no longer needed. Rok-redis is backed up by a cloud volume. Rok uses Redis as an in-memory data structure store to cache metadata.
Comments
0 comments
Please sign in to leave a comment.