A Distributed Systems Primer
For People That Don't Have the Time.

by Kevin Sekniqi. Twitter.

Often times it can become extremely confusing for beginners to get started with blockchain tech. Understanding the technology is actually fairly straightforward, but it requires some proper context. Furthermore, more likely that not, most people don't really need to understand the inner workings of blockchains to get started. What is really needed is a context and comparison guide. In this blog post I will try to give a brief primer into distributed systems, discussing various families of consensus protocols, the types of network models, guarantees, as well as various other pertinent ideas. This is meant as a guide for people that want to understand what the heck some of the jargon in the field means, but that don't really want to jump into the details.

Rolling Back the Clock

Distributed systems is a very broad area, but for the sake of relevance and simplicity, we will focus on the sub-area of consensus. Let's start from the very beginning, moving the clock all the way back to the 80s. A bunch of really smart computer scientists were trying to answer the question: how can a set of machines agree on a message? Let's unravel that question a bit more, because, to be fair, it seems like a really strange question to ask.
    Suppose you have a database that will be storing mission critical data, such as financial information (a bank). You cannot afford to lose any piece of it, so you will immediately try to replicate the data across many machines, labeled $m_0, m_1 \dots, m_n$, such that if any machine goes down ... well ... no harm done.  
    To build this system we can employ the most intuitive approach, namely: assign one of the machines as the "leader" (say $m_0$) and the rest as followers such that whenever there is a new transaction coming in from clients (eg. "Subtract $\$$10 from Bob's account"), the leader orders all followers to perform that transaction. This sounds great, except that we have a few problems. First, what happens if $m_0$ orders $m_3$ to subtract $\$$10 from Bob's account, but the message dropped and it never reached $m_3$? This is really bad. We can no longer rely on $m_3$ to restore data since it now contains inconsistent data. Jot that down: inconsistent data means that some of your machines are now storing different data from other machines, when they aren't supposed to.

Enter Acks

A basic solution to the problem is to ensure that $m_3$ has actually received the message. We have now reached our first atomic commit protocol, namely the Two-Phase-Commit (2PC). It's called atomic because the idea is that if a transaction is to be committed, then it either should be committed everywhere or nowhere (... yes, I know, it doesn't make a whole lot of sense). The augmentation is exactly what you would imagine: $m_0$ doesn't just blindly receive messages from clients and forwards them to the other machines, it also ensures that the replica machines acknowledge the messages. Ok, fair enough, but what happens if something goes wrong with, say, $m_6$, even though other machines have already subtracted the $\$$10 from Bob's account? No worries, just rollback! Here's the flow:

Introducing Consensus

The astute reader might have noticed some gaping holes in our construction of 2PC/3PC. For one, if the leader fails then it needs to be replaced. However, there is other, more pressing flaws.
    First, what if we introduce another leader, say $m_1$, to the system? We would want to do this so that we are not bottlenecked by $m_0$. However, this causes coordination issues. If Bob sends two transactions, where $T_1 = $ "Subtract $\$$10 from Bob's account'' and $T_2$ = "Subtract $\$$20 from Bob's account", but where Bob only happens to have $\$$20 total, then all hell may break loose. $m_0$ may issue $T_1$ to the network, but $m_1$ may issue $T_2$. Depending on how the order of these operations occurs, the follower machines will now validate the transactions but will have inconsistent states. Does Bob now have $\$$10 or $\$$0 left? As far as distributed systems go, this is as bad as you can get.
    Second, suppose that we do solve the coordination issue between leaders. What happens if some of the follower nodes are down? After all, this is exactly the assumption that we started off with. How will we be able to tolerate these nodes? If we wait for them to come back online, we could run into some massive performance issues, especially when human intervention will be necessary to bring some of the servers back up. If we choose to ignore some of the machines that are down, just how many should we choose to ignore? More importantly, what if some of the machines are up, but their hard-drives are corrupted? They are appropriately running the protocol, but they are not successfully writing to disk. Anticipating these issues is a total mess, so we need something truly robust, that is able to simply ignore all these issues as long as up to some threshold of the machines are behaving correctly. As far as distributed systems go, this is the holy grail.
    These (plus some other) issues bring us to our very first consensus protocol, Paxos. This also brings us back to exactly that question: how can a set of machines agree on a message? Paxos, invented by Leslie Lamport, is a consensus protocol in the crash fault model of computation. We won't go into the details of how Paxos work, because these details are incredibly daunting, but at a very high level, it's actually quite simple. The protocol moves in rounds, where at each round one of the machines proposes a new value (say, with index $n$). If a majority of the other machines approve of this value, then the proposer collects votes and issues a commit to the network. If another proposer also issues a value at index $n$, then it will be rejected. That's it. Again, there are some crucial details I'm leaving out, but you really don't have time for those. The magic of Paxos though is that as long as a majority of the machines accept a proposal, then it is guaranteed that we will never reach an inconsistent state. This is magical, and for now that's all you need to know.
    By the way, I mentioned in the previous paragraph the term "crash fault". Crash fault means that the machines in the network are assumed to ... well ... at worse crash. They cannot behave arbitrarily. That means that they cannot say "yes, I accept this transaction" when they in reality meant to reject it. This brings us one step closer to blockchains.

All Faults Are Not Made Equal

Turns out that the crash fault model is basically the de-facto model used in data-centers nowadays. If any enterprise is deploying some distributed application that requires a consensus protocol (and by this time you understand exactly why we would want to run a consensus protocol), they most likely are running one in the crash fault model, such as Paxos.
    But wait, what the heck happens if the machines get hacked? Most businesses assume this won't happen, and to be fair, a combination of VPN, firewalls, and other security mechanisms can make hacking attempts basically impossible. But, if we do still care about protecting ourselves if there are hacks, then we need a consensus protocol that can tolerate machines that lie. This is the area of Byzantine Fault Tolerance, or BFT. Why is it called Byzantine? Because back in the day, the way that this problem was introduced (again, by Lamport) was as a thought experiment between a set of Byzantine generals and lieutenants, who were separated by enemy lines yet wanted to communicate with each other on a plan of attack.

A Bunch of (Evil but Necessary) Jargon

You may be asking: "Alright, Kevin, I don't have the time for this, is crash-fault and Byzantine-fault the only two terms I need to know to become an expert in distributed systems?" The answer is almost. You really shouldn't bother to understand the exotic flavors of consensus protocols, but you should understand some of the general classes that arise. Here are a few: 

  1. Failure models:
    a. Crash-fault: As we said, a consensus protocol is in the crash-fault model if the machines are assumed to -- at worse -- crash.
    b. Byzantine-fault: Machines can not only crash, but they can also lie. In fact, they can behave arbitrarily.
  2. Network models:
    a. Synchronous: We not only can consider the types of failures that machines can encounter, but we can also consider the type of network communication assumptions we can make. In the synchronous model of communication, it is assumed that nodes that behave correctly will send and receive messages within a well known upper bound of time. For example, you can assume that every message gets sent within 5 seconds, 5 minutes, or 5 hours.
    b. Asynchronous: This is the exact opposite of synchronous. The message communication delay can be arbitrarily unbounded even for correct nodes. An important consequence here is that you cannot distinguish between a correct node that is taking infinitely long to respond, and a node that has simply crashed.
    c. Partially Synchronous: This model is somewhere in between full synchrony and asynchrony. It basically means that an upper bound exists but it is not well known. There is some strange consequences to this model, but they will run us into some super subtle complexities. The most important thing to understand is that this model of communication most closely resembles actual wide-area-network communication (i.e. the internet). This is my opinion though, so don't come after me if another expert in the area disagrees.
  3. Message models: I will only consider one type of message model: authenticated communication. This means that nodes will sign messages and everyone can verify that a message coming from some other node is actually authentic. Why is this is the only thing to consider? Because, it turns out that there are some extremely dismal impossibility results which make consensus in the unauthenticated message-passing model completely impossible, under all types of network models (described above). So, let's not even bother with that model, and just stick to the authenticated case.
  4. Guarantee models: This one is weirdly named, but I think it best describes the situation.
    a. Non-probabilistic: If a consensus protocol is non-probabilistic (i.e. determinist), it means that safety is guaranteed (there's no probability distribution) as long as the threshold number of correct nodes is met. This is confusing, but it will become clear when I introduce the next model.
    b. Probabilistic: If a consensus protocol is probabilistic, then we automatically induce a probability distribution. Typically speaking, this model guarantees safety only with probability 1 - $\epsilon$, where $\epsilon$ is some value that the system designer chooses. For example, a probabilistic protocol could guarantee safety with 99% assurance even if the threshold number of correct nodes is met. Keep this in mind! This is very important!

That is basically all the jargon you need to know for now. So, we introduced what consensus is used for, the various flavors of it, and basic ideas on constructions. So now, let's get to the cake: blockchains.

Blockchains, or ... I Don't Really Care Who You Are

Alright, so let's review a bit. Consensus protocols are basically these algorithms that allow us to deploy a database amongst many machines who are potentially distrusting (i.e. they do not believe that anyone in particular will tell the truth, but they believe that the majority will). We introduced various assumptions we can make when choosing our consensus protocol, whether it is in the synchronous setting or not, or whether it even guarantees safety deterministically or only probabilistically.
    When I introduced Paxos above, I mentioned that nodes vote on proposals. Turns out that voting is basically the only mechanism that was used for decades. This is fine for most cases. However, suppose I wanted to deploy a database publicly, meaning that anyone could join and propose transactions to the system. Well, now there is a massive issue: if I allow anyone to join my system, I cannot tell who's who. I have no idea who is participating. Basically, consensus without identities is impossible. Or rather, it was, until Satoshi came along.
    Bitcoin exactly solved this problem. The brilliant insight of Satoshi was that we could replace voting with a different mechanism, namely proof-of-work. Bitcoin was the very first consensus protocol that made absolutely zero assumptions about who is in the network. Since identities are not required, this makes Bitcoin perfect to deploy in the permissionless setting, wherein nobody requires permission from the rest of the nodes to join. Bitcoin is brilliant, but simple: nodes do not vote but rather propose values using a weight attached to each proposal. The weight is the proof-of-work, which solves a cryptographic puzzle. Proposals are linked together, such that the heaviest (not longest, as is usually misunderstood) chain of proposals forms the "correct" history. This is basically Bitcoin. That's it.
    One last thing: where does Bitcoin fit with the models we discussed above? Bitcoin is in the Byzantine-fault-tolerance model, where the network is synchronous, the messages are authenticated (signed by the priv-key holders), and the guarantees are probabilistic, meaning that the chain can revert at any time if there is a fork. In other words, you may accept that Bob now only has $\$$10, but be aware that there is a small chance you are wrong.


This is basically it. You've gone through the topics that a PhD student studying distributed systems will have to go through in excruciating detail. You are now a confident, well-informed reader of the blockchain world.