Validating resilience of a simple Kubernetes application

Kubernetes provides a great level of resilience for container-based applications.

If a Pod dies and is part of a ReplicaSet, DaemonSet, or StatefulSet, Kubernetes will restart it without user interaction.

But the question is the following, “Is this feature good enough to make an application highly-available?”

In this blog, I will explore a very simple application (guestbook, described at and see if it provides application resilience.

Guestbook architecture

The guestbook has a very simple architecture, with the following components:

Guestbook architecture

The frontend, written in PHP and JavaScript, uses both the Redis master (for writing) and Redis slaves (to query).

It seems the architecture is pretty solid, as there can be many instances of the Frontend and the Redis slave. The only concern is the single instance for the Redis master.

But before we look at the single point of failure for the Redis master, let’s look at a design problem first.

Concurrency problem

The guestbook application has one flaw: The JavaScript code keeps track of the messages and simply appends the new message to the list of fetched messages.

So if another user opens a browser and inserts a message, and the current user inserts a new message, then other user’s message will be lost:

  • Let’s call the users John and Mary and assume the application has no message.
  • John opens his browser and points to the guestbook application.
  • Mary does the same.
  • John inserts the message “Hello from John”.
  • Mary inserts the message “Hello from Mary”.
  • Mary’s message will overwrite John’s.

A new version of the application is available at

Testing the application resilience

Now that the application seems stable, let’s test its resilience.

So I created a Node.js application that does the following:

  • Clear all the messages from Redis
  • Sends 100 messages, one at a time
  • Ensure that there are 100 messages at the end

The code can be retrieved from

The results were as expected: we had the 100 messages correctly persisted in the database.

Now, what will happen if we decide to send these messages at the same time, without waiting for the confirmation?

Testing many requests in parallel

So let’s try to send the requests in parallel and see if the application can successfully process them. The result is the following:

Data:  ,1,0,2,3,6,7,9,4,8,11,10,12,13,15,14,16,21,17,19,18,23,20,22,25,24,29,27,26,35,39,36,38,41,42,40,43,44,47,45,46,51,49,48,5,57,59,58,60,61,62,65,63,64,67,66,68,69,70,71,73,50,75,72,76,37,34,79,80,77,78,74,84,85,83,82,81,86,87,31,32,91,89,88,90,95,94,30,28,33,52,53,56,54,55,93,92,96,98,99,97

Even though the messages were not recorded in the order the test application sent, the 100 messages were successfully persisted.

So it seems the application is “ready for production,” and nothing else needs to be done for its resilience.

Right? Wrong.

As I test with 200 messages, the number of messages persisted is consistently short of 200.

Refactor the application

The frontend application is written in JavaScript (running on the client/browser side) and PHP. So far, the component has been simple, but to support the enhancements we need to do, I decided to break it into two microservices: the JavaScript code and a separate backend component.

Here is the new architecture:

Splitting the frontend component

Even though PHP is tremendously popular as the language to write web pages, I prefer to use Node.js to write REST-based components.

You can see the implementation at

Examining the application

Back to the scenario with 200 messages, as we examine the application closer, we see the following snippet of code:

messages = await retrieveMessages();// [...]const result = await setAsync(MESSAGES_KEY, messages);

So the problem is that between retrieving the messages and appending them, there might be another request that will append another message, which will then be lost.

Here is a sample output:

Add message result: "31"Add message result: "31,21"Add message result: "31"Add message result: "5"Add message result: "31,21,8"Add message result: "31,21,26"Add message result: "31,13"Add message result: "31,21,26"Add message result: "31,21,2"Add message result: "31,24"Add message result: "31,21,8"Add message result: "31,21,8"Add message result: "31,21,2,1"Add message result: "31,21,26,65,7"Add message result: "31,21,26"

To ensure that nobody else is updating the messages after we read them, we need to implement a transaction lock mechanism.

Photo by <a href=”/photographer/patrykb-51990">Patryk Buchcik</a> from <a href=”">FreeImages</a>

Implementing a locking mechanism

There are a few ways to resolve the issue described above.

In this article, I will discuss a simple one: wrapping the code above in a lock, so that one only backend will fetch and update the messages at a time.

This solution will certainly cause a delay in the response, as just one request can talk to the Redis database at a time.

The solution is available at

In my next article, I will discuss a different solution, involving decoupling the frontend and backend, by using a queue.


In this article, I described the evolution of a simple Kubernetes application to achieve resilience.

First, I had to fix a bug to allow multiple users to add messages at the same time. Then, I needed to wrap some code around a lock, so that simultaneous request would not step on each other.

Bring your plan to the IBM Garage.
Are you ready to learn more about working with the IBM Garage? We’re here to help. Contact us today to schedule time to speak with a Garage expert about your next big idea. Learn about our IBM Garage Method, the design, development and startup communities we work in, and the deep expertise and capabilities we bring to the table.

Schedule a no-charge visit with the IBM Garage.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store