Stateful Containers: Paradox or Paradise?

Allon Cohen January 19, 2017 blog, containers, stateful containers

Achilles and the Tortoise have just completed the cooperative mode of Portal 2 on the PlayStation 3 in the company’s recreation room (yes, PlayStation 3, this is an ancient parable after all). Achilles, the fleetest of all mortal DevOps engineers, is wearing his “Kubernetes or Die” T-shirt.  The Tortoise is holding the VMworld notebook he received at last summer’s conference.

 Mr. T (as the Tortoise liked to be called):  This was a triumph.  I’m making a note here…huge success.

Achilles: Yes, which is much more than I can say for Zeno’s new containers project.

Mr. T: Who’s Zeno?

Achilles: The new director of infrastructure, I’m heading to his weekly meeting at the Elea conference room.

Mr. T: Ah, the same guy that invited all employees to the charity jogging event and wanted to turn it into a race.

Achilles:  Yep that’s the one.  He just doesn’t understand containers.

Mr. T: How so?

Achilles: Well, we showed him the website containerization project. We demonstrated how containers reduced our time-to-production by tenfold, how we improved quality by automating testing, and how we could dynamically scale to 1,000s of containers when we got millions of views during the Olympics…but he still didn’t get it.

Mr. T: What? After seeing all these advantages, he still didn’t like containers?

Achilles: No. He loved them.  He asked for screenshots and slides, showed it to management, and convinced them to give all of DevOps a trip to an offsite in Maui.  As an extra bonus, he let me tag along to the sales team kickoff…those guys can party…

Mr. T: So, what are you complaining about?

Achilles: Well, he liked the advantages of containers so much that he wants to expand their use so that more applications can benefit. He even asked us to containerize the company’s enterprise applications, starting with the ordering systems.

Mr. T: So?

Achilles: You see, for the website project, data didn’t change all that often.  We kept data stored in an object repository that was handled by the storage team. It was not part of the project, in fact it was an “SEP field”.  In contrast, one of the crazy customer requirements for the ordering system is that order data needs to be actively persisted. There is no doubt about it, we had the product management team research this. They interviewed the end users, and showed a nice pie chart indicating that 95.2% of operations guys said they need to see the customer orders to fulfill them.  Not to mention the 42% in the accounting team, that also said they needed to see real data to produce accurate reports.
But that requirement is a show stopper. You see, containers have traditionally shined in stateless microservices applications.  We want the freedom to deploy the containers anywhere in the environment or even in the cloud, we want to assure high availability of service, and we want to scale applications on demand.  We even want to dynamically load balance the front ends so that users can hit any of the containers and still see their latest data. Storage systems just can’t keep up with us.

Mr. T: That’s a long list of demands.  Let me write them down. (Pauses.) You know, we had precisely the same problem in the second millennia AC.

Achilles: This is not going to be one of your stories about joining the company in 1999 and being the first to implement VMs, is it?

Mr. T: Yes, it is.  You should learn from the mistakes of your elders.

Achilles: Mistakes, you made mistakes? This is going to be good.  Let me record this. (Takes out his iPhone points and clicks)

Mr. T (prepares two double chocolaty lattes in the office coffee machine and passes one to Achilles): You see, in the olden days, we sent out teams to conferences. Not the virtual ones you attend through your browser, mind you.  These were physical events you had to travel far and wide to get to.  Our teams went there digging for solutions, when, quite accidently, they uncovered monoliths.  Back then we did not know how large they would grow.  However, as it turned out, these monolithic applications had an insatiable appetite for data. Soon the capacity of drives on one server was not enough.  The original server monoliths soon begat another monolith, standalone storage array SANs.  Another monster to house in our data centers.  What’s more, the two types of monoliths needed their own language to speak to each other, block-based Fibre Channel.  Just to confuse us, they didn’t even spell fiber correctly.

Achilles: A true horror story…is this going to take much longer?

Mr. T: Not much.  You see, as demand grew for these enterprise applications, we needed a way to scale them and assure their high availability.  Virtualization provided a good solution on the compute side, but storage just didn’t keep up.  In an attempt to circumvent the data sharing limitations of block volumes, SCSI reservations were invented.  These reservations allowed one application to both read and write to a block volume, while other applications were limited to read privileges. You could then dynamically reassign the right to write between the application instances.   This was a system so cumbersome and complex that, to this day, more people successfully pass training exams for quantum physics than those SCSI reservations.

Achilles: Eureka!  That is what I’ll tell Zeno.  When I claimed that containers don’t persist data, he pointed me at Kubernetes persistent volumes, and asked why I don’t use them.  So, I researched it a bit and found that there are two groups of storage providers for Kubernetes…block storage providers and file storage providers.  The applications guys wanted to use block storage for the database, but block storage providers in Kubernetes only allow read access, not read/write, when mounting to multiple pods.  So, the SCSI reservations don’t work for containers.  We are at an impasse, a paradox of sorts.  A.) Containers are great so I should expand their benefit to enterprise applications….but B.) To expand container usage to enterprise applications, I need multiple applications to have read/write access…and C.) Block storage providers for containers can’t support multiple read/write accessors.  There is no way to reconcile requirements A, B, and C.  The project is doomed to fail.  Containers will remain in the small realm of stateless applications.

Mr. T:  Ahh, logical conundrums, I always enjoyed those. Usually it turns out that they are not really paradoxes at all, but rather something in your assumptions was wrong.  Let’s walk back a bit.  You said that there are two types of storage providers block-based and file-based.  Do file-based providers allow multiple read write?

Achilles: Well, yes.  But the application guys wanted block.  They had tried file ten years ago, and felt it was too slow.

Mr. T:  Ten years ago is quite some time back (I won’t bore you with another story), but that was a time when files could only be served using monolithic filers based on HDDs.  Yep, those were slow…but things have changed. Today you can easily find flash-native, distributed file systems that can outperform even the fastest block arrays on transactional workloads.  They are relatively new, so they are not that well known yet.  But if our readers got this far, they are already aware of at least one.

Achilles: What readers? I thought you were over your Deadpool phase. But, in any case, even if your flash-native, distributed file system was fast enough to handle transactional workloads, and even if it could handle multiple read/write access for thousands of pods in parallel, there are still additional requirements for it to fit my stateful containers project.

Mr. T:  Yes, I remember, I wrote those down:

  1. Access from anywhere in the environment or in the cloud: Check! A flash-native, distributed file system would be great for this.  Much like containers, NFS mounts were developed to work across environments.  It doesn’t matter if you are running your containers on bare metal servers, inside virtual machines, or in the cloud, they can always access their persistent data in the same way.  When using a distributed file system with cross-cloud capabilities, you can even develop your stateful containers in one environment, and easily deploy them in another.
  2. Enterprise-class high availability: Check! Make sure the flash-native, distributed file system you choose, can easily handle failures to any component.  You should be able to handle the failure of flash drives, servers, or networking without loss of data or service interruption.  Self-healing is also important. If the file system detects an issue, it should be able to automatically bring the system back to a highly available state without requiring user intervention.
  3. Scale on Demand: Check! You need a distributed file system that can dynamically add and remove nodes.   That way your storage environment can grow as seamlessly as your container deployments.  During peak usage, just add more nodes without any downtime.  During low usage, simply remove nodes or easily reassign storage capacity to another project.
  4. Support for dynamic load balancing across front end applications: Check! This is where a POSIX-compliant file system really shines. POSIX compliance assures a consistent view of the data across all nodes.  Unlike object based system that promise “eventual consistency” (an oxymoron), a POSIX-compliant distributed file system makes sure that, regardless of the node accessing the data, each accessor always receives the absolute latest version of a file.  In fact, you can update a file on one system, and that update is immediately available to all other systems.

Achilles: So, you are saying this hypothetical, flash-native, distributed file system would allow me to expand the usage of containers to all of our enterprise applications? Even those that are stateful?

Mr. T: Quite so, and it’s not really hypothetical, you know.  I’ll even throw in a bonus capability: Choose a software-defined solution that is agnostic to hardware, enabling it to run on the same servers you are running your container platform on.  The procurement department people will love the cost savings.

Zeno (enters the cafeteria):  There you are, Achilles.  Being the fleetest of foot of all mortals, I don’t understand why you always arrive late to meetings.

Achilles: Hey Zeno!  We solved your container paradox!


To learn more about Elastifile’s solution for stateful containers, please check out this solution brief.