Prefab Global Delivery
At Prefab, we care about speed and reliability. We’re asking our customers to trust us in their critical code paths, so we need to be near bulletproof.
Part of obsessing about reliability is looking at the system as a whole and asking, “What happens if this part fails? How do we make sure our customers and their users are unaffected?”
And, sure, there's a bit a self-interest here: I don't want to be woken up at 3 AM because something is down. We also don't want our customers to be woken up at 3 AM because something is down.
We took a look our architecture, asked a lot of questions, and realized we could do better.
The Existing Architecture
The previous architecture for serving Frontend clients looked like this:
This works well. The CDN (Fastly) can cache contexts that it has seen before and serve them speedily around the globe. Unseen contexts (or requests after flag rules have changed) hit the API (which calculates the results in a few ms) and are newly cached in the CDN.
Fastly is reliable, but if it isn't responding or is slow, the library falls back to the API directly.
Points of failure
Looking at this diagram, it isn't terrible. The failover to hit the API directly gives us some redundancy.
But there are two points of failure:
- Google Spanner: This is highly available, but it could go down. If it goes down, the API can't serve requests because it can't get the latest rulesets
- The API server
If either of these isn't working, the CDN can serve stale content until things are working again. If there is no cached content for the user, the client-side library will return false
for flags and undefined
for configs. This is behavior the developer can code against, but it's not ideal.
Redundancy and Reliability
Looking back on human history (or just watching an old episode of America’s Funniest Home Videos), it seems few things are funnier than someone else’s pants falling down (sorry, I don't make the rules). On the flip side, few things are more embarrassing than one’s own pants falling down. To that end, humanity created the belt and suspenders combo. If the belt malfunctions, no worries; the suspenders will hold your pants high.
Looking at this diagram, it was clear that we needed a belt and suspenders. We’ll arbitrarily call the old approach (CDN + API + Spanner) the suspenders. How do we build out the belt?
The parts of the belt
First, what does the API do, exactly? It is a simple HTTP server that recieves requests that bear Context (about the current user) and an API key. It validates the API key, reads the rulesets from Spanner, evaluates the rulesets against the context, and returns the results.
To insulate us from Spanner being down, we'd need a copy of the rulesets somewhere else. We use protobufs to serialize the rulesets internally so the easiest way to get a copy of the rulesets is to write them to a file in cloud storage. We can write to as many cloud storage hosts as we want (for redundancy), and they can be read from anywhere in the world. James built out a data pipeline to write any changes to the rulesets to cloud storage.
To protect against the API server being down, we needed to build out a Global Delivery Network service to read those files and evaluate them against the context. We knew that we wanted to run this service as close to the user as possible to keep latency minimal. Being globally distributed also gives us some redundancy if any one region is having issues.
The tech stack
I started work on this when we were wrapping up our Go SDK. We found Go fast, developer-friendly, and fun to work with. It made sense to use Go for the service.
I'd been following Fly.io for a while, used them in a hobby project, and was impressed with their region support. They seemed like a natural fit.
We threw a CDN in front of the service to cache responses. CDN + Global Delivery Service = Belt.
Finally, we had to update the front-end clients to try the belt before suspenders. If the CDN has fresh content, we serve it from a globally nearby location. If the CDN misses, then we hit an edge server running on Fly.io’s globally-available hosts and cache the response for next time.
Our reliability story is now improved greatly 🎉 If we can’t get a good response from the belt, then we fail over to suspenders. If we can't get a good result from suspenders, we hit the Global Delivery Network (servers on Fly) directly.
How does this work in the real world?
- Belt Cached
- Belt Miss
- Belt Down & Suspenders Cached
- Belt Down & Suspenders Miss
- Belt CDN & Suspenders Down
Here's the best case scenario: Your context has been cached in the belt CDN and is globally distributed. The CDN serves the content from a nearby location, and the user gets a fast response.
If the content isn't cached in the belt CDN, we hit the Global Delivery Network. The Global Delivery Network is a set of edge servers running on Fly.io. The belt CDN will cache the response for next time.
After trying the belt and finding the CDN or Global Delivery Network aren't immediately responsive, we fall back to the suspenders. The suspenders CDN will serve the content from a nearby location, and the user gets a fast response. Because the edge server is geographically closer the CDN, the response will be faster.
After trying the belt and finding the CDN or Global Delivery Network aren't immediately responsive, we fall back to the suspenders. If the content isn't cached in the suspenders CDN, we hit the API. The suspenders CDN will cache the response for next time.
After trying the belt and finding the CDN and Global Delivery Network aren't immediately responsive, we fall back to the suspenders. After trying the suspenders and finding the CDN or API or Spanner is down, we fall back to the hitting the Global Delivery Network server directly. Since the edge server is geographically close to the user, the response will be faster than hitting the API directly.
Belt and suspenders. If the belt fails, the suspenders will serve. If the belt and suspenders fail, the Global Delivery Network will serve.
If all of that is down, the internet itself is probably largely broken. Of course, you always have the option to run your own server in your infrastructure as well.
We're using this belt+suspenders approach for our server-side SDKs as well, so they also benefit from geographically proximate servers and redundancy.
Let's compare the architecture from before and after:
- Before
- After
Speed Improvements (more than just a nice side effect)
The API server in the original diagram was plenty fast — it took a few milliseconds to serve the request. But the API server is not globaly distributed and at some point, latency becomes more about where you are (geographically) in relation to the server than the speed of the server itself. I'm 13
ms round-trip from us-east-1-atl-1a
but 39
from us-east-1-bos-1a
. In the front-end, every millisecond counts. The further you get from a server, the worse the impact.
Let's look at a particularly bad example of this. Talking to our classic endpoint from Sydney is particularly slow. Here's a WebPageTest (Native Connection, no traffic shaping) comparison of the classic endpoint and new endpoint being accessed from Sydney (with a local Sydney Fly.io server):
There's probably more room to get that blue line down a little lower further, but we haven’t gone down the path of squeezing further performance out of the belt yet (e.g., we're currently running on shared-cpu-1x:1024MB machines). But deploying at the edge already gives us savings of ~200ms in the worst-case performance for the old endpoint. We'll be observing real-world numbers to get a better feel for how we should scale the machines and shape regional strategy.
Update to the latest clients for speed improvements, and stay tuned for better performance ahead.