Implementing Disaster Recovery strategy for IBM Cloud Private — an Introduction
IBM Cloud Private (ICP) is a private cloud platform for developing and running workloads locally. It is an integrated environment that enables you to design, develop, deploy, and manage on-premises, containerized cloud applications behind your firewall. It includes the container orchestrator Kubernetes, a private image repository, a management console and monitoring frameworks.
In this article, I will describe some initial thoughts on how to implement a Disaster Recovery (DR) strategy for ICP.
DR is not HA
There is a lot of confusion around what is High Availability (HA) and Disaster Recovery.
I am not trying to be pragmatic here, but it’s worthy to spend some time differentiating HA from DR.
HA is “a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.” (https://en.wikipedia.org/wiki/High_availability)
Many people associate HA with a single site, and DR, multiple sites.
This is not true.
DR “involves a set of policies, tools and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster” (https://en.wikipedia.org/wiki/Disaster_recovery)
Do we need DR for ICP?
Kubernetes provides many capabilities to enable HA: using ReplicaSets, for example, Kubernetes will ensure that the desired number of instances will be deployed to the environment.
ICP, por extension, inherits a lot of these HA capabilities. Furthermore, the ICP components (master, management, VA, worker, and proxy nodes) can be deployed in HA mode.
High Availability can be extended to multiple sites, by deploying multiple ICP environments and having a Load Balancer on the top, distributed the requests to applications deployed to these environments.
This leads to the following question: What needs to be highly available: the Kubernetes cluster or the application deployed there?
In my opinion, the applications are the ones that need to be highly available.
Certainly, it’s vital to have a robust Kubernetes cluster that enables HA applications. However, the focus should be that the applications deployed to one or more ICP environments are HA.
So, why implement a DR strategy for an ICP environment?
Well, although it’s possible to deploy applications across multiple ICP environments, and it’s simple to re-create an ICP environment, such environment holds information that might need to be preserved:
- Application log files
- Metering information
- Audit information
- Vulnerability report
How to implement DR for ICP
There are different ways to use multiple ICP environments. In this article, I will focus on deploying separate ICP environments to achieve HA and DR.
The first option is an Active / Active scenario, where applications are deployed to multiple ICP environments, and a Load Balancer distributes the traffic across these applications. Notice this is not DR, but in fact a HA deployment.
The second option is an Active / Passive configuration, where one ICP environment is dormant, and the Load Balancer is directing all the traffic to the Active one. There are two possibilities for an Active / Passive modeL
- Hot stand-by
The application is concomitantly deployed to the Active and Passive sites, and during a disaster, the Load Balancer is reconfigured to send the request to the Passive site. This configuration allows for a shorter Recovery Time Objective (RTO).
- Cold stand-by
In this situation, the applications are deployed to the Passive site as part of the DR procedure. Depending on the application start time, it might lead to many minutes of unavailability until the application is ready.
Implementing DR for the Persistent Volumes
What about the storage being used by these applications?
Storage is presented to the Kubernetes applications as Persistent Volumes (https://kubernetes.io/docs/concepts/storage/persistent-volumes/)
Like the ICP nodes, we need to consider how to implement a DR strategy for the Persistent Volumes (PVs).
A single storage site hosting the PVs for multiple ICP environments would represent a single point of failure. So we need to have multiple storage solution, ideally close to the worker nodes.
To implement an Active / Passive solution, as described above, we need to replicate the data from the Active site to the Passive one. The good news is that most Storage vendors provide a solution to replicate data across volumes.
The caveat is that the applications deployed in this Kubernetes environment need to be aware that the data in the Passive site is constantly being updated. Some components need to be “refreshed” to re-fetch the data from the PV.
ICP provides a robust HA environment. Applications can use Kubernetes concepts to provide HA, and the ICP components can be deployed in HA mode.
In this article, I described how to implement DR for ICP. In the next article, I will describe how to use the IaaS capability to implement DR.
Bring your plan to the IBM Garage.
Are you ready to learn more about working with the IBM Garage? We’re here to help. Contact us today to schedule time to speak with a Garage expert about your next big idea. Learn about our IBM Garage Method, the design, development and startup communities we work in, and the deep expertise and capabilities we bring to the table.
Schedule a no-charge visit with the IBM Garage.