This is sorta old news, but new to me as I don't pay much attention to restaurant technology.
This story is so wild I couldn't believe what I was reading and this is from 2018. One can only dream to ever be as badass as CFAs DevOps team. Chick-fil-A, my favorite defense contractor. Thread title is bit of sarcasm and will make more sense further at bottom of rabbit hole. Quoting whole article as it blew my mind.
I had no idea this was even a thing, and I can’t help smiling at the conversation that must have gone down at USAF. “Sir, I believe we’ve found a superior implementation of Kubernetes...it’s.. it’s unexpected.” USAF will no longer be conducting ops on Sundays.
And they actually posted a follow-up a few weeks ago:
And Waffle House is a branch of FEMA.
Long live the House of Waffle.
Esplainer::
Kubernetes
This story is so wild I couldn't believe what I was reading and this is from 2018. One can only dream to ever be as badass as CFAs DevOps team. Chick-fil-A, my favorite defense contractor. Thread title is bit of sarcasm and will make more sense further at bottom of rabbit hole. Quoting whole article as it blew my mind.
Quote:Edge Computing at Chick-fil-A
by Brian Chambers, Caleb Hurd, Sean Drucker, Alex Crane, Morgan McEntire, Jamie Roberts, and Laura Jauch (the Chick-fil-A IOT/Edge team)
In a recent post, we shared about how we do bare metal clustering for Kubernetes on-the-fly at the Edge in our restaurants. One of the most common (and best) questions we got was “but why?”. In the post we will answer that question. If you are interested in more or prefer video to text, you can also check out Brian and Caleb’s presentation at QConNY 2018.
Why K8s in a restaurant?
Why does a restaurant company like Chick-fil-A want to deploy Kubernetes at the Edge? What are the goals? That, we can answer.
- Low-latency, internet-independent applications that can reliably run our business
- High availability for these applications
- Platform that enables rapid innovation and that allows delivery of business functionality to production as quickly as possible
- Horizontal scale — infrastructure and application development teams
Capacity Pressures
Chick-fil-A is a very successful company in large part because of our fantastic food and customer service — both the result of the great people that operate and work in our restaurants. They are unmatched in the industry. The result is very busy restaurants. If you have been to a Chick-fil-A restaurant, this is not news to you. If you have not been, it’s time to go!
While we are very grateful for our successes, they have created a lot of capacity challenges for us as a business. To put it in perspective, we do more sales in six days a week than our competitors do in seven (we are closed on Sunday). Many of our restaurants are doing greater than three times the volume that they were initially designed for. These are great problems to have, but they create some extremely difficult challenges related to scale.
At Chick-fil-A, we believe the solutions to many of these capacity problems are technology solutions.
A new generation of workloads
One of our approaches to solving these problems is to invest in a smarter restaurant.
Our goal: simplify the restaurant experience for Owner/Operators and their teams and optimize for great, hot, tasty, food served quickly and with a personal touch, all while increasing the capacity of our existing footprint.
Our hypothesis: By making smarter kitchen equipment we can collect more data. By applying data to our restaurant, we can build more intelligent systems. By building more intelligent systems, we can better scale our business.
As a simple example, imagine a forecasting model that attempts to predict how many Waffle Fries (or replace with your favorite Chick-fil-A product) should be cooked over every minute of the day. The forecast is created by an analytics process running in the cloud that uses transaction-level sales data from many restaurants. This forecast can most certainly be produced with a little work. Unfortunately, it is not accurate enough to actually drive food production. Sales in Chick-fil-A restaurants are prone to many traffic spikes and are significantly affected by local events (traffic, sports, weather, etc.).
However, if we were to collect data from our point-of-sale system’s keystrokes in real-time to understand current demand, add data from the fryers about work in progress inventory, and then micro-adjust the initial forecast in-restaurant, we would be able to get a much more accurate picture of what we should cook at any given moment. This data can then be used to give a much more intelligent display to a restaurant team member that is responsible for cooking fries (for example), or perhaps to drive cooking automation in the future.
Goals like this led us to develop an Internet of Things (IOT) platform for our restaurants. To successfully scale our business we need the ability to 1) collect data and 2) use it to drive automation in the restaurant.
Rather than select many different vendor solutions that were silo’d and disconnected and therefore difficult to integrate, we wanted an open ecosystem that we own. This approach enables us to allow internal developers and/or external partners to develop connected “things” or applications for the Edge and leverage our platform for common needs like security, identity, and connectivity.
Put another way, the IOT/Edge team owns a small subset of the overall application base that is deployed on the edge, and is effectively a runtime platform team for the in-restaurant compute platform.
Architecture Overview
Here is a look at the architecture we developed (and are using today) to meet these goals.
Cloud Control Plane
Today, we run our higher order control plane and core IoT services in the cloud. This includes services like:
- Authorization Server — device identity management
- Data ingest / routing — collect data and send it where it needs to go
- Operational time series data — logs, monitoring, alerting, tracing
- Deployment management — self-service deployment for our application teams using GitOps (kudos to our friends at Weaveworks for sharing the term). We plan to share our work and code soon.
These services also run in Kubernetes so that we have a common paradigm across all of our team’s deployments.
Edge
“Edge Computing” is the idea of putting compute resources close to the action to attain a higher level of availability and/or lower level of latency.
We think of our Edge Computing environment as a “micro private cloud”. By this, we mean that we provide developers with a series of helpful services and a place to deploy their applications on our infrastructure.
Does this make us the largest “cloud provider” in the world? You can be the judge of that.
In all seriousness, this approach gives us a unique kind of scale. Rather than a few large K8s clusters with tens-to-hundreds-of-thousands of containers, at full scale we will have more than 2000 clusters with tens of containers per cluster. And that number grows significantly as we open many new restaurants every year.
Our edge workloads include:
- platform services such as an authentication token service, pub/sub messaging (MQTT), log collection/exfiltration (FluentBit), monitoring (Prometheus), etc.
- applications that interact with “things” in the restaurant
- simple microservices that serve HTTP requests
- machine learning models that synthesize cloud-developed forecasts with real-time events from MQTT to make decisions at the edge and drive automation
Our physical edge environment obtains it’s connectivity from multiple in-restaurant switches (and two routers), so we operate on a very highly available LAN.
Today, nearly all of the data that is collected at the Edge is ephemeral and only needs to exist for a short time before being exfiltrated to the cloud. This could change in the future, but it has kept our early-stage deployments very simple and made them much easier to manage.
Since we have a highly available Kubernetes cluster with data replicated across all nodes, we can ensure that we can retain any data that is collected until a time when the internet is again available for data exfiltration. We can also aggregate, compress, or drop data dynamically at the Edge based on business need.
Given all of this, we still use the cloud first for our services whenever possible. Edge is the fallback deployment target when high-availability, low-latency, in-restaurant applications are a must.
Following in the footsteps of giants
We run our Edge infrastructure on commodity hardware that costs us, ballpark, $1000/restaurant. We wanted to keep our hardware investment low-cost and horizontally scalable. We have not found a way to make it dynamic/elastic yet, but maybe one day (order on Amazon w/ free 30-second shipping perhaps?).
With our architecture, we can add additional compute capacity as-needed by adding additional devices. We may upsize our hardware footprint in the future, but what we have makes sense for our workloads today. Google also got started scaling workloads on commodity hardware (written in 2003, where has the time gone?)… so we’re in good company!
We hope our approach to Edge infrastructure encourages creative thinking for anyone that is working with scarce resources (physical space, budget, or otherwise). I think that’s everyone.
Everything else
Finally, we have our connectivity layer and the IOT things. Many of our devices are fully-powered (no batteries) but still constrained on chipsets/processing power. Wi-Fi is the default method for connectivity. We have not committed to a low-power protocol yet, but are interested in LoRa and the WiFi Halo (802.11ah) project.
For things/physical devices, our team also provides developers with an SDK for executing on-boarding flows (all OAuth based) and for accessing services within the Edge environment such as MQTT. Brian talked about this more extensively at QConNY 2017 (note that we were using Docker Swarm vs. Kubernetes at the time).
Containers at the Edge
Why would you want to run containers at the Edge? For the same reason you would run them in the cloud (or anywhere else).
Dependency management becomes easier. Testing is easier. The developer experience is easier and better. Teams can move much faster and more autonomously, especially when reasonable points of abstraction (such as k8s namespaces) and resource limits (CPU/RAM) are applied.
Another key design goal was reliability for our critical applications, meaning no single points-of-failure in the architecture. There are many ways this type of goal can be achieved, but we decided having > 1 physical hosts at the Edge and a container-based orchestration layer was the best approach for the types of workloads we have and for the skillset of our teams.
We initially called our strategy “Redundant Restaurant Compute”, which really speaks to the goal behind our approach. Later, we transitioned to calling it “Edge Computing”.
The ability to orchestrate services effectively and to preserve a desired number of replicas quickly and consistently attracted us to Kubernetes.
Why Kubernetes?
At first, we planned to use Docker Swarm for container orchestration, but made the move to Kubernetes in early 2018 because we believe it’s core capabilities and surrounding ecosystem are (by far) the strongest. We are excited about taking advantage of some of the developments around Kubernetes like Istio and OpenFaaS.
Most importantly, we firmly believe…
Ideas, in and of themselves, have no value.
Code, in and of itself, has no value.
The only thing that has value is code that is running in production. Only then have you created value for users and had the opportunity to impact your business.
Therefore, at Chick-fil-A, we want to optimize for taking great ideas, turning them into code, and getting that code running in production as fast as possible.
Research says that using the latest and greatest technology (like Kubernetes) has no correlation to a team or organization’s success. None. The ability to turn ideas into code and get code into production rapidly does.
If you can accomplish this goal and meet your requirements with VMWare, a single machine, some other clustering tool or orchestration layer, or using just the cloud, we would not try and convince you to switch to Kubernetes and follow our lead. The simplest possible solution is usually the best solution.
At Chick-fil-A, we believe that the best way for us to achieve these goals is for us to embrace containerization, and to do so with Kubernetes. At the end of the day, its all there to help you “Eat More Chicken”.
Next
What are our next steps?
In the next 18-24 months, we expect to increase the number of smart/connected devices in our restaurants to > 100,000 and complete our chain-wide rollout of Kubernetes to the Edge. The number of use cases for “brain applications” that control the restaurant from the edge continue to grow, and we look forward to serving them and sharing more with the community about what we learn.
We will also take a close look at the Google GKE On-Prem service that was just announced since it might save us from building and managing clusters in the future. What we have done to-date is a sunk cost, and if we can re-focus resources on the differentiating rather than infrastructure, you can be certain we will.
If we failed to answer questions you have about what we’re doing, don’t hesitate to reach out on LinkedIn. If we can help you with any challenges with Kubernetes at the Edge, please let us know. At Chick-fil-A, it is always our please to share, and (kudos to Caleb for this) “our pleasure to server you”.
Edge Computing at Chick-fil-A
I had no idea this was even a thing, and I can’t help smiling at the conversation that must have gone down at USAF. “Sir, I believe we’ve found a superior implementation of Kubernetes...it’s.. it’s unexpected.” USAF will no longer be conducting ops on Sundays.
And they actually posted a follow-up a few weeks ago:
Quote:Enterprise Restaurant Compute
by the CFA Enterprise Restaurant Compute Team
The last time we talked publicly about our Edge Kubernetes deployment was summer of 2018.
Since then, we have completed a chain-wide deployment and run it in production for almost 4 years. Every Chick-fil-A restaurant has an Edge Compute cluster running Kubernetes. We also run a large-scale cloud-deployed infrastructure to support our restaurant footprint.
We have integrated with several of our restaurant systems to assist with Kitchen Production processes or onboarding mobile payment terminals used in our Drive Thru. In total, there are tens-of-thousands of devices deployed across our restaurants that are actively providing telemetry data from a wide variety of smart equipment devices (fryers, grills, etc).
Our purpose today is to catch readers up to our current state and share what has changed over the past 4 years. There are still many exciting opportunities for the platform on the horizon, but we’ll leave that for another day…
[Much more at the headline link]
And Waffle House is a branch of FEMA.
Long live the House of Waffle.
Quote:How the U.S. Air Force Deployed Kubernetes and Istio on an F-16 in 45 days
As hybrid cloud strategies go, the U.S. military certainly is taking a unique approach.
Just like almost everything else, military organizations increasingly depend on software, and they are turning to an array of open source cloud tools like Kubernetes and Istio to get the job done, according to a presentation delivered by Nicholas Chaillan, chief software officer for the U.S. Air Force, at KubeCon 2019 in San Diego. Those tools have to be deployed in some very interesting places, from weapons systems to even fighter planes. Yes, F-16s are running Kubernetes on the legacy hardware built into those jets.
“One point for the team was to demonstrate that it could be done,” Chaillan said. He challenged the Air Force and its partners to get Kubernetes up and running on a jet in 45 days, and while that was as difficult as it sounds, the team met the goal and F-16s are now running three concurrent Kubernetes clusters, he said.
Esplainer::
Kubernetes
"It is hard to imagine a more stupid or more dangerous way of making decisions than by putting those decisions in the hands of people who pay no price for being wrong." – Thomas Sowell