Part 4 is a ~80 minute read
In this section, we employ a depth-first approach to teach you how to design systems. The systems described in Part 4 are as simple as possible in order to maximize the practical knowledge you can take with you. The inspiration for this section is the famous quote, “If you can do Hash maps and the Monte Carlo Method, you can solve any problem in computer science.”
🤥 OK, fine, you caught us—Einstein didn’t actually say this. However, this quote does come from a competitive programmer who made it to top 24 in the world, so that’s got to count for something!
This section takes the idea behind that quote even farther. If you understand the power of immutability, deduplication, enumeration, and how to get yourself unstuck, you can understand how to design any system. Part 4 is a master class in which you’ll get to observe the decision-making process of senior engineers as they build systems. Along the way, we’ll teach you additional tips and tricks to help you excel during an interview. Once you’ve finished reading Part 4, you’ll possess the knowledge you need to truly understand system design interviews. And you’ll see there was never any reason to be scared of designing systems in the first place.
Let’s get started. The best way to learn is to build, so we’ll begin by designing something that is:
To keep things simple, imagine we don’t need to edit Pastes; we just need to remove them sometimes.
It helps to align on requirements before designing the system. A conversation with the interviewer flows better if functional and non-functional requirements are introduced.
Functional requirements (FRs) tend to map 1:1 to the functionality the service can expose to the outer world.
Similar to DS&A questions, it’s easy to make assumptions about the problem. Think of the requirements stage as a fluid conversation between yourself and the interviewer. While you will drive the conversation, you should always check in with them to see if you’ve missed anything. Many people forget that they can literally ask the interviewer if they’ve missed any important requirements.
Not sure if you got all the requirements? Think you’re missing something important, but you don’t know what? Turn your requirements gathering into a conversation and get the interviewer involved. Ask your interviewer: “Are there any important requirements you have in mind that I’ve overlooked?” This is totally allowed!
While not strictly necessary, it’s worth it to be able to answer questions about the scale of a service in under a minute. This is known in the industry as the “back of the envelope” calculations of load estimates.
Both in an interview and in real-life design, it’s useful to be able to estimate orders of magnitude, especially when it comes to users, data volumes, or costs.
An important note: There’s no “right” way to do these estimates, but there are “wrong” ways. So, in practice, this is one of the aspects of the conversation where it’s more important to focus on being “not wrong” rather than on being “right.”
Estimates often can (and should!) be validated against real-life numbers, which are frequently public. Test your estimation skills here:
Some interviewers hate this step and really don’t want to see you stumble through 5th-grade math calculations for 15 minutes. Similar to step 2, ask your interviewer if they’d like to see some calculations before jumping in and starting them—you might be able to skip these entirely if the interviewer doesn’t care about it! With quite a few system design interviews, you’d be fine as long as you do mention that the system you plan to present will be durable, resilient, and scalable.
Sometimes it’s difficult to know what to calculate during this part of the interview. If you’ve already confirmed that the interviewer wants to see calculations as mentioned in Tip #3, then follow these rough guides to get the basic estimates for any system.
Storage Estimation:
Storage = daily data used by 1 user * DAU count * length of time to store data
Bandwidth Estimation:
Bandwidth per second = (daily data used by 1 user * DAU count ) / total seconds in a day
Also, there are roughly 100K seconds in a day, which is five orders of magnitude. If your API gateway expects to see a billion requests on a busy day, that’s approximately 10K requests per second, as 9 zeroes minus 5 zeroes is 4 zeroes. The true figure is ~15% larger, as 100K / (60 * 60 * 24) is around 1.15.
Intuitive things first:
The shared folders analogy, from (1), immediately takes us about half way to designing a document store: a non-relational object persistence layer. It’s broadly acceptable to refer to it these days as a NoSQL database, even though along many axes it is nearly the exact opposite. It would not support transactions or joins, for example. On the other hand, for the purposes of designing Pastebin, neither joins nor transactions are required. Plus, we want the benefits of pagination. This allows us to leverage the strong sides of document stores / NoSQL databases, while not suffering from any downsides of them. The strong sides include features such as replication, automatic failover, and out-of-the-box horizontal scaling.
The takeaway message here is that document stores truly are not very different from reliable shared folders, optimized to be used by software through APIs calls (2), not by humans via a command line or via drag and drop. They also abstract away the underlying OS details (as well as cloud / containerization / etc.), allowing their user to focus on the business logic of their application instead.
The unique names generation task, (3), is a well-known problem by itself. It is sometimes referred to as KGS, the Key Generation Service. We will tackle this problem later in Part 4; for now it will suffice to say that as long as the keys can be long enough, it is straightforward to reduce the risks of collision to zero, or to an infinitesimally small number.
The previous paragraphs outlined the key concepts that will help you reason about the problem. Depending on the interviewer, different parts of the problem might become more or less important with respect to getting an excellent score.
In this section we will cover some of those parts, from first principles, and loosely in the order of likelihood that you’ll be asked about them.
Time to look into mutability.
As we talked about in Chapter One, most system design problems are trivial when reduced to an immutable case.
A good way to dive into the depths of system design is to attack a problem that is all about mutability.
In other words, we need to look into a problem where the objects can be changed. The perfect problem to illustrate this point would focus on mutability and nothing else. In search of such a problem, let’s cover two warm-up ones first.
The problem statement for deduplication is deceptively simple:
Without loss of generality (WLOG), let’s assume the X-es are strings of some reasonable length, say, under two hundred bytes. (If you want to learn more about the math-y term “WLOG,” you can do that here.)
In computer science terms (or, rather, in “algorithms and data structures” terms), the problem of deduplication is effectively about building a largely distributed append-only hash set.
While system design interviews are not the same as algorithms and data structures ones, never forget that, ultimately, it’s all about writing code that works! The difference is this: doing vs. directing. In a coding interview, you’re literally writing the code. In a system design interview, your goal is to assist those writing the code by providing architecture / direction / guidance.
Keep in mind, we work on requirements not because the script of the interview contains such a section, but because they help us to better reason about our solution.
What the above effectively tells us is:
Memory is fast, disk is slow. In this problem, since every record is a small number of bytes, we may be able to fit everything in memory.
This is a very simple trick, and while it is not often used directly in production systems, it’s incredibly useful to know when learning about system design.
The idea is this:
Keep in mind, we are still looking at a semi-immutable problem of deduplication: we can only “get to see” every possible input X; we can not “unsee” an X.
For a positive example, consider N = 5, W = 3, R = 3. What this means for our deduplication solution is:
Take a moment to internalize why the above would work when R + W is greater than N, and why it may fail if R + W is equal to or less than N.
The redundancy factor is (N-W) for writes and (N-R) for reads; as long as there are enough machines to pick W of them to write to we can write, and as long as there are enough machines to pick R to read from we can read. The important part is that the write machines and the read machines intersect by at least one, so W+R should be strictly greater than N. This is because if W+R is equal to N, or is less than N, there's no guarantee that a set of W randomly picked machines and a set of R randomly picked machines will intersect. And if they do not intersect then the read will miss the data that was just written!
Always pay attention to solutions that keep working “just fine” if some nodes are down. Since things are going to go wrong, it’s best to have systems that can deal with that without issue, so you don’t have to! 😎
Last but not least: The above solution would use each machine’s RAM or disk capacity at its (W / N) fraction. In other words:
Obviously, if you assign parts of the key space (i.e., different values of X) to N machines intelligently, not randomly, you can achieve better RAM and disk utilization. But the example is still quite illustrative: we now know that without any custom knowledge on how to design a system, we can build a resilient, highly available, multi-node deduplication engine, from first principles!
Before we declare ourselves done with the warm-up problem, let’s consider its extension.
This problem statement is very similar to the above:
At first glance, enumeration is identical to deduplication, but instead of a hash set, we need a hash map.
But this first glance is deceptive. This enumeration problem is substantially harder than the deduplication one.
The challenge is in assigning IDs in a fixed order (as in: 1, 2, 3, and so on). In order to stamp the new, unique index, all other calls should wait, as this very next unique index “distributed variable” is a shared resource.
In the case of enumeration, the very same (R + W + 1) > N trick would not work, at least directly.
The more massive the shared state has to be, the more difficult it is to solve the problem in a generic fashion. And the enumeration problem is a perfect illustration of why exactly it is so difficult.
When attacking the Enumeration problem, the concept of a read-write ratio comes into play.
If there are only 1,000,000 distinct X-es in the world, eventually—and rather quickly—all of them will become seen. Since we don’t need to invalidate indexes, each particular node of our service could “memorize by heart” the indexes for these 1,000,000 X-es, after which the problem can be treated as an immutable, read-only one. And, as you know by now, immutable problems are easy to solve.
Exhausting the set of unique inputs is quite a rare case. But it does happen often that the vast majority of inputs (say, 99.9%) would be the ones that have been already seen. Since inquiring about the already seen X is an immutable operation, this would result in a 999:1 read-to-write ratio (which is safe to round to 1000:1, for all intents and purposes).
The problem of generating unique IDs serves as an excellent real-life illustration for the ideas we’ve discussed so far. Unlike the two warm-ups above, Unique ID Generation is a real problem. And it is commonly used as an interview question, including this mock interview hosted by interviewing.io.
We will, of course, be solving the problem in a distributed setting. In other words, the solution should be highly available and consistent. This means that we will be using several nodes that communicate with each other. This way, if some nodes die and/or become unresponsive, temporarily or permanently, the system remains:
Pragmatically speaking, the problem is quite straightforward, as long as we answer three important clarification questions. All three fit solidly in the Functional Requirements (FRs) bucket.
During a system design interview, we focus on functional requirements not to please our interviewers, but to set the table for ourselves and to stack the deck of the upcoming story in our favor!
Let’s dive deeper into why these are the important questions.
Question #1, about whether the IDs are allowed to repeat at all, is quite binary. Either generating a non-unique ID is absolutely unacceptable, or it is allowed once in a blue moon.
To answer this from a practical standpoint, consider the cost of generating a non-unique ID for the business. If this cost is bounded by a reasonably small amount, then, in practice, it is not entirely unacceptable to have a non-unique ID from time to time.
Consider Amazon, AirBnb, or Expedia order ID generation. What can go wrong if the ID is not unique? A customer might have a ruined experience. What’s the cost of this for the business? Probably in the order of dozens or hundreds of dollars; this cost is not likely to be above, say, $10,000. Thus, if the likelihood of a generated ID to not be unique is such that a duplicate may emerge once in ~10 or ~50 years, the “daily” cost of this “imperfect” implementation is less than what the company will pay for organic sweetener in the office. As a result, “fixing” this “problem” may not be worth it at all.
A strong user- and product-facing candidate should be able to articulate the above.
With this said, if the IDs are allowed to be not unique once in a while, the problem is sort of non-existent: generate a random 128-bit number, and you’re good for ages. In fact, for most practical intents and purposes, even a random 64-bit number would do.
With the above deep dive being said—and you should somehow indicate to your interviewer that you understand it!—let’s approach the Unique ID generation problem under the assumption that the IDs should be guaranteed to be unique.
Question #2 effectively boils down to “64 bits vs. 128 bits.” 128 bits, as shown above, are a lot. For the record, a standard UUID is of 128 bits, although a quick Googling shows it “only” contains 122 bits of entropy.
And question #3 is only important in conjunction with question #2, as what truly matters is the rate at which the key space is being exhausted. Simply put, the very problem is almost identical if we add 10 more bits to the IDs and generate the IDs 1,000 times faster.
Though it is rare, sometimes system design interviews are indeed about doing math with estimates. A good interviewer will test your intuition on the above, and human intuition is notoriously unreliable when it comes to very small and very large numbers in general and to probabilities in particular. So if you skipped the above math-y deep dive, at least read up on the Birthday Paradox. 😊
So, effectively, the true problem is:
Really, that’s it.
For the unique key generation problem, just do a bit of trivial (*) math.
(*) Professors tend to refer to anything that’s semi-straightforward to follow as trivial. Thankfully, our interviewers are not that picky. Although the above is relatively standard math to someone who is fluent in probability and statistics, and, chances are, your interviewer may well be—system design interviews are often conducted by more senior people, and more senior people work with data more often than average.
If we need to utilize the key space effectively, we’re fully in the realm of distributed consensus, etc. Hard problem.
If we can be loose, we just need to employ a few simple tricks to minimize our risks. Easy problem.
First of all, if you are looking at the easy problem, it should be noted during the interview that just generating the IDs randomly would solve it for all intents and purposes.
Go ahead and explain this to your interviewer. Explain that even if each collision costs the company $10,000, it is still not worth an extra minute of your time, as an engineer employed by this company, to be solving this problem.
Because this is The Correct Answer, if this unique ID generation problem presents itself in the real life setting.
(Not to mention that any large company that employs you expects to make considerable money from you; for top IT giants this figure is over one million dollars per year per engineer, although things did turn south since 2020.)
Of course, after you cover this, the interviewer will ask you to actually solve the problem, in a scientifically and engineering-ly precise way, in an “academic” setting. This is what the next section is about.
But before you get too fancy, it won’t hurt to suggest a few trivial improvements. In short, having a lot of key space allows for plenty of “cheap” tricks.
For example, each node of your now-distributed service can just assign the IDs in the 1, 2, 3, … fashion. You can then prepend each generated ID with something that contains:
If you have a large key space (say, 128 bits), you can easily allocate ~32 bits for the ID of the machine (an IPv4 address) and another ~32 bits for the Unix timestamp, in seconds, when this particular machine has started its “session.” This leaves you with plenty of indexes per each node: 2^64, to be precise, as you’ve “used up” 64 bits of 128.
And if one of your machines suddenly runs out of whatever range you have allocated for it, you can “just” restart that machine. It will get itself a new prefix and continue afresh. Of course, the restart itself is not necessary, as the machine can just (automatically) change the value of that variable it uses internally.
The above solution should be good enough to pass the interview about Unique ID generation. Before we get to solve the Hard version of it, let’s add a few more assumptions. They do, by the way, qualify for Non-Functional Requirements. Here they are:
For (a), a simple solution comes from number theory. Security specialists know it. Just perform some relatively trivial mathematical function before returning the ID to the user, while keeping this function reversible, so that when F(x) = y, some G(y) = x. A Modular Multiplicative Inverse might do the job just fine.
And (b) is exactly where the solution to the broader problem comes into play!
Imagine that you have designed a system where the “users” can “request” IDs in bulk. (We will explain how to do it in a minute.) The trick is that each node of your system is effectively performing the same task: it needs to request a large “bulk” of IDs from some “central authority,” and then distribute these large bulks in smaller quantities to its “retail customers.”
It’s as simple and as genius as that.
To arrive at a “proper” solution, the “large” bulks should not be too large. Because, if a node dies, it is impossible to tell how much of its latest bulk it has “distributed” so far. In other words, when a node dies, some part of a key space will be wasted.
Moreover, if the node does not die, but terminates gracefully, it may then “return” the “unused excess” of its bulk to the system, which sounds very fair. But it may present a challenge to the “bulk provider service” to deal with “partial” bulks. It may well be better to deliberately discard the “rest of the bulk,” even if the node that has previously requested this bulk is terminating gracefully.
Simply put, the bulk size can be calibrated such that each node requests a new “bulk” to “distribute” approximately every several seconds or maybe several dozen seconds. That’s the sweet spot. Make this number (this bulk size or this time window—they can now be used interchangeably) too tight, and the “central authority” would need to handle more load than needed. Make it too loose, and, especially if machines do die often (or need to be migrated often), a non-negligible portion of the key space would be wasted unnecessarily.
Now, to the gist of the problem. The very “central authority”:
So, in a true System Design fashion, use some standard distributed component there, FFS!
Even if each “call” to “reserve” another “bulk” takes a second or two to settle—because the request would have to travel across the globe, possibly more than once, to guarantee strong consistency—it’s perfectly fine with your design, as long as each node requests a new “bulk” those few seconds in advance, before it runs out of its “current” “bulk.”
That’s it for ID generation. Relatively simple.
Any design for Unique ID generation that attempts to store Every Single Issued ID is a bad idea.
More junior, and/or less experienced engineers often make this mistake. As they say: “Forewarned is forearmed.” At least, store “bulks” only, not each issued ID. Better yet: only store the “latest issued bulk ID” or the “next available bulk ID.” That’s it—relatively simple still.
Want to know exactly what a FAANG System Design interviewer looks for? Get detailed feedback on your system design skills from our professional interviewers.
All right, we now know enough about things like mutability, deduplication, and basic requirement gathering skills to be dangerous. Let’s now tackle a larger problem—not too complicated, but it’ll stretch us a little bit.
This is almost certainly a system design question you haven’t encountered, and depending on when you were born, it might not even be an application you’ve seen before! 🤯 No worries, we can think of this as one of the simplest chat apps imaginable. Keeping it simple allows us to design any chat app from first principles and provides a solid structure for us to work with, regardless of what chat app we are asked to design.
For the younger readers who aren’t familiar with the AOL Instant Messenger (AIM) application before, here are a few snapshots so you can get a sense of the basic functionality of the application.
Yikes! Those font choices clearly weren’t vetted by a UX designer! 😂 OK, so clearly the app has a few specific functionalities. A requirements list may look something like this:
Wow, that’s still quite a large list for such a simple application. For now let’s ignore the authentication pieces—while they are important, they don’t really make up the core of this application. Let’s spend some time focusing on the actual messaging parts and assume we can log in already.
As we did in the previous chapter, we should discuss the scale of our app in terms of functional requirements. Again, this isn’t just strictly necessary for an interview, it’s also a useful tool to help frame what we actually care about in the design of the system.
Ranking the order of importance with functional and non-functional requirements is silly because a single requirement not being filled will lead to a bad system.
Still, for any given design problem, there is usually at least one particularly relevant requirement. This means that there’s likely to be one requirement which is the linchpin of the system.
Generally, what do you think are the most critical parts to an app that allow you to talk to other people? Seriously, think about it. I’ll be here when you get back.
Did you think about it? I mean it! Think for a moment about this. What are things that are just “givens” about a chat app? What have you come to expect when it comes to these types of apps?
OK, so you might have phrased it differently, but if you thought something like, “It’s important to get messages in a timely manner,” or maybe “it’s important to always be able to send a message when you want to,” or possibly even “It’s important to be able to access my messages when I want to see them,” then fantastic—you’re absolutely right.
In plain english, we’ve just named our most important non-functional requirements: Consistency, Availability, and Durability.
This makes up the bulk of the most important requirements in our system!
Awesome, now that we’ve discussed the requirements, much of our work has already been done. Let’s talk about what the system actually looks like. 🧐
It’s common to try to detail every part of the system’s design like you see people do on YouTube. Realistically, these videos are scripted, and the drawings are fast-forwarded. In a real interview, you won’t have time to actually detail every part of the system, and that’s OK! It’s expected that you’ll abstract away pieces that aren’t particularly relevant. It’s a good practice to call out what you’re abstracting, but just focus on the general data flow of the system.
In our diagram we need to show how data is getting from one client to another client and what it does in the system. The diagram can start simple and then evolve if the interviewer wants you to “zoom in” and discuss specific parts. In our example though, the diagram could be as simple as this:
Now that we have an idea of how the system has data flowing through it, we might want to discuss one other critical piece. The sending of the data makes sense from the diagram alone, but how does that second user know it’s time to receive data? Are they checking the service every few seconds to see if something new is present for them? Is there some way we can alert the AOL user that they have a new message? The app is real-time, so how do we ensure that messages are delivered as promptly as possible? These questions are answered by knowing a bit about key ways computers can interact with other computers. There are three major types of ways computers talk to one another: Long Polling, Short Polling, and WebSockets.
These can be best explained through an analogy.
Short Polling
Remember when you were younger and you’d always ask the question, “Are we there yet?” Repeatedly asking this same question every few minutes is a good example of short polling. Over a short period of time, we are constantly asking, “Are we there yet? Are we there yet? Are we there yet?”
This is short polling in a nutshell. We repeatedly bombard the servers with the question, “Are there new messages for me?” Just as this was a bad idea to do in your parents’ car when you were a child, it’s a bad strategy in basically every system design structure. It’s annoying to the servers (causes extra processing power) and doesn’t scale (can you imagine the amount of resources wasted by 1 million users all asking our system this every few milliseconds?).
Long Polling
Did you ever have a forgetful aunt or uncle? Imagine that you’re a little older now and driving on a roadtrip with that person. When you both get in the car, you ask them to tell you when you all reach a particular spot on the trip. This could be fine for a shorter trip because they’d probably remember to tell you. But on a really long trip, they might forget to tell you when you reached that particular spot, perhaps because the car ride took too long and they’d simply forgotten you ever even asked them to tell you.
This, again, is an analogy for long polling. Our client reaches out and asks the server to tell us when something new has updated for us, which helps us successfully avoid the waste of resources. But this fails when the time between responses can be particularly long (think more than a few minutes). So long polling can be good when we are expecting data repeatedly, but it’s not great when it could be a while before we get the data.
Finally, let’s imagine a car ride with our best friend. We ask them to tell us once we’ve come to a particular spot on our journey. They say, “Sure, I’ll tell you when we get there,” and then we patiently wait for them to tell us. They aren’t forgetful, so we trust them to let us know no matter the length of the car trip. This, in essence, is WebSockets.
A key part that distinguishes WebSockets from long polling is that with WebSockets we can have arbitrary lengths of time pass without needing to worry about the connection timing out and the server “forgetting” us. We also have two-way communication, so the client can talk to the server, but the server can also directly talk to the client (whereas in long polling the server can “forget” how to talk to them).
For a chat app, out of these three choices, a case could be made for either long polling or WebSockets, but WebSockets is clearly a better design choice since we don’t know how long it’ll be between messages being sent and arriving.
This is the core of every chat app, and hopefully it provides you with a good description of how we can model future problems. Start with requirements, decide what matters most for the application with non-functional requirements, show the flow of data in your app with a diagram (maybe a couple!), and then iron out the details.
Though we certainly could go deeper into any part of the design and unpack things further, this represents a good overview of how the first major chat apps were designed and shows the basic model that all chat apps since that time have been built on.
There are two major parts to this problem when it is used as an interview question:
Let’s focus on the first sub-problem. The real-time part is a special case that deserves dedicated attention.
As usual, we begin from requirements. In fact, it’s best to postulate the problem right in the form of requirements! This way we also develop a habit of thinking of system design problems from the grounds of how to solve them, as asking the right questions is at least half of solving them.
Don’t worry if you don’t know in detail what the Ticketmaster problem is about. In fact, for any problem, if you don’t fully understand its statement, jump straight to functional requirements, and clarify them—with your interviewer or with your peers—until they are crystal clear!