Kubernetes Case Study: Booking.com

Ankit Pramanik
6 min readMar 8, 2021

What is Kubernetes?

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation. It has a large, rapidly growing ecosystem. Kubernetes services, support, and tools are widely available.

The name Kubernetes originates from Greek, meaning helmsman or pilot. Google open-sourced the Kubernetes project in 2014. Kubernetes combines over 15 years of Google’s experiencerunning production workloads at scale with best-of-breed ideas and practices from the community.

Containers have become popular because they provide extra benefits, such as:

  • Agile application creation and deployment: increased ease and efficiency of container image creation compared to VM image use.
  • Continuous development, integration, and deployment: provides for reliable and frequent container image build and deployment with quick and efficient rollbacks (due to image immutability).
  • Dev and Ops separation of concerns: create application container images at build/release time rather than deployment time, thereby decoupling applications from infrastructure.
  • Observability not only surfaces OS-level information and metrics, but also application health and other signals.
  • Environmental consistency across development, testing, and production: Runs the same on a laptop as it does in the cloud.
  • Cloud and OS distribution portability: Runs on Ubuntu, RHEL, CoreOS, on-premises, on major public clouds, and anywhere else.
  • Application-centric management: Raises the level of abstraction from running an OS on virtual hardware to running an application on an OS using logical resources.
  • Loosely coupled, distributed, elastic, liberated micro-services: applications are broken into smaller, independent pieces and can be deployed and managed dynamically — not a monolithic stack running on one big single-purpose machine.
  • Resource isolation: predictable application performance.
  • Resource utilization: high efficiency and density.

Why you need Kubernetes and what it can do

Containers are a good way to bundle and run your applications. In a production environment, you need to manage the containers that run the applications and ensure that there is no downtime. For example, if a container goes down, another container needs to start. Wouldn’t it be easier if this behavior was handled by a system?

That’s how Kubernetes comes to the rescue! Kubernetes provides you with a framework to run distributed systems resiliently. It takes care of scaling and failover for your application, provides deployment patterns, and more. For example, Kubernetes can easily manage a canary deployment for your system.

Kubernetes provides you with:

  • Service discovery and load balancing Kubernetes can expose a container using the DNS name or using their own IP address. If traffic to a container is high, Kubernetes is able to load balance and distribute the network traffic so that the deployment is stable.
  • Storage orchestration Kubernetes allows you to automatically mount a storage system of your choice, such as local storages, public cloud providers, and more.
  • Automated rollouts and rollbacks You can describe the desired state for your deployed containers using Kubernetes, and it can change the actual state to the desired state at a controlled rate. For example, you can automate Kubernetes to create new containers for your deployment, remove existing containers and adopt all their resources to the new container.
  • Automatic bin packing You provide Kubernetes with a cluster of nodes that it can use to run containerized tasks. You tell Kubernetes how much CPU and memory (RAM) each container needs. Kubernetes can fit containers onto your nodes to make the best use of your resources.
  • Self-healing Kubernetes restarts containers that fail, replaces containers, kills containers that don’t respond to your user-defined health check, and doesn’t advertise them to clients until they are ready to serve.
  • Secret and configuration management Kubernetes lets you store and manage sensitive information, such as passwords, OAuth tokens, and SSH keys. You can deploy and update secrets and application configuration without rebuilding your container images, and without exposing secrets in your stack configuration.

CASE STUDY:Booking.com

After Learning the Ropes with a Kubernetes Distribution, Booking.com Built a Platform of Its Own

Company Booking.com

Location Netherlands

Industry Travel

Challenge

In 2016, Booking.com migrated to an OpenShift platform, which gave product developers faster access to infrastructure. But because Kubernetes was abstracted away from the developers, the infrastructure team became a “knowledge bottleneck” when challenges arose. Trying to scale that support wasn’t sustainable.

Solution

After a year operating OpenShift, the platform team decided to build its own vanilla Kubernetes platform — and ask developers to learn some Kubernetes in order to use it. “This is not a magical platform,” says Ben Tyler, Principal Developer, B Platform Track. “We’re not claiming that you can just use it with your eyes closed. Developers need to do some learning, and we’re going to do everything we can to make sure they have access to that knowledge.”

Impact

Despite the learning curve, there’s been a great uptick in adoption of the new Kubernetes platform. Before containers, creating a new service could take a couple of days if the developers understood Puppet, or weeks if they didn’t. On the new platform, it can take as few as 10 minutes. About 500 new services were built on the platform in the first 8 months.

Booking.com has a long history with Kubernetes: In 2015, a team at the travel platform prototyped a container platform based on Mesos and Marathon.

Impressed by what the technology offered, but in need of enterprise features at its scale — the site handles more than 1.5 million room-night reservations a day on average — the team decided to adopt an OpenShift platform.

This platform, which was wrapped in a Heroku-style, high-level CLI interface, “was definitely popular with our product developers,” says Ben Tyler, Principal Developer, B Platform Track. “We gave them faster access to infrastructure.”

But, he adds, “anytime something went slightly off the rails, developers didn’t have any of the knowledge required to support themselves.”

And after a year of operating this platform, the infrastructure team found that it had become “a knowledge bottleneck,” he says. “Most of the developers who used it did not know it was Kubernetes underneath. An application failure and a platform failure both looked like failures of that Heroku-style tool.”

Scaling the necessary support did not seem feasible or sustainable, so the platform team needed a new solution. The understanding of Kubernetes that they had gained operating the OpenShift platform gave them confidence to build a vanilla Kubernetes platform of their own and customize it to suit the company’s needs.

And “as our users learn Kubernetes and become more sophisticated Kubernetes users, they put pressure on us to provide a better more native Kubernetes experience, which is great,” says Tyler. “It’s a super healthy dynamic.”

The platform also includes other CNCF technologies, such as Envoy, Helm, and Prometheus. Most of the critical service traffic for Booking.com is routed through Envoy, and Prometheus is used primarily to monitor infrastructure components. Helm is consumed as a packaging standard. The team also developed and open sourced Shipper, an extension for Kubernetes to add more complex rollout strategies and multi-cluster orchestration.

To be sure, there have been internal discussions about the wisdom of building a Kubernetes platform from the ground up. “This is not really our core competency — Kubernetes and travel, they’re kind of far apart, right?” says Tyler. “But we’ve made a couple of bets on CNCF components that have worked out really well for us. Envoy and Kubernetes, in particular, have been really beneficial to our organization. We were able to customize them, either because we could look at the source code or because they had extension points, and we were able to get value out of them very quickly without having to change any paradigms internally.”

Thank You for reading this article.

--

--