On The Tensile Strength of Block Finality

This post concerns itself with the consequences of block re-org MEV, how much stress blockchains can sustain before breaking, and what are the protocol-level things we can do to permanently rectify this issue. Effectively, it tries to answer this question (thanks, Jeremy!):

Tensile

Tldr and some brief words of reassurance: block-reorg MEV is both nothing new and likely not a long-term threat to blockchains. Forms of this MEV have existed since the beginning, even in Bitcoin (e.g. see Eyal and Sirer’s Selfish Mining paper). With that said, block-reorg MEV could be annoying, and engineers need to be aware of these issues because — if commonplace — we’ll indeed need to bring on more human intervention than ever before.

Side note: as I was writing this post, I was made aware of this other post by Tom Schmidt at Dragonfly Capital, which is a good precursory read to this article because it talks about speculative impact of MEV, including some game theoretic solutions to it.

Diving In

What is MEV? It stands for “Miner Extractable Value”, but what that really means is paying miners for preferential treatment in transactions. Preferential treatment can be used for annoying, but ultimately benign behaviors such as paying miners to allocate to you first in a public auction, or can be used for malignant activities such as overwriting the database (i.e. blockchain) entirely, or “reorg-ing”. We are not concerned with users using MEV to arb. Instead, we are concerned with the latter, dangerous form of MEV: paying miners to overwrite the database.

So, back to the question posed by Jeremy: “What is the potential worst case scenario here?”

Before answering that question, we have to answer to preliminary questions:

Have correctness assumptions been violated or not?
Is the blockchain a PoW or PoS-based blockchain?

Question 1: Have correctness assumptions been violated or not?

Underlying all consensus protocols used by all blockchains are a set of assumptions that are required to be maintained in order for the blockchain to operate “normally”. The most important of these assumptions is that the majority of the miners in the protocol behave correctly, meaning that they follow the consensus protocol exactly as intended.

It may seem weird that we have to “assume” certain conditions when building a complex dynamic system, but it’s actually perfectly sensible, and we do so all the time. For example, aviation engineers implicitly provide no protections against a mid-flight meteor strike because they explicitly assume that a plane will never encounter such an event. The same is true for consensus protocols: we assume that a majority of miners are correct, and therefore do not really design any protection mechanisms against them. Why? Because, as it turns out, there is no way to build a consensus protocol that is both live and safe without this correctness assumption, regardless of any other assumptions. In fact, this property is provably impossible, and is one of the fundamental results in distributed systems.

In any blockchain, you assume majority of miners are correct. Whether you assume this because of rational economic behavior or because of altruism, it doesn’t matter. You just assume it. Due to details relating to even more assumptions of the synchronicity and asynchronicity of the underlying messaging layer, PoW-based blockchains require roughly 51% correctness and PoS-based blockchains require roughly 67% correctness (p.s. I’m purposefully simplifying here for the sake of readability, especially as it pertains to details revolving around Avalanche, but for this post, assume those rough numbers).

MEV is dangerous because you’re paying miners to violate this correctness assumption and have them obviate from the intended protocol.

Question 2: Is the blockchain a PoW or PoS-based blockchain?

There’s ultimately no difference between PoW and PoS in the guarantees they provide if the correctness assumptions are guaranteed. However, due to the peculiarities in their constructions, they do differ if the assumptions are violated.

Specifically, PoW chains have no notion of identity (i.e. validator identity), and rely purely on proof-of-work to generate transactions. As long as there is a single miner online generating nonces, then the blockchain will remain live and continue operating. This means that re-orgs can happen at any time and for any reason, with the caveat that if the correctness assumptions are not violated, then reorgs become probabilistically more and more unlikely as the depth of the chain grows larger and larger. Conversely, PoS chains have a notion of identity, and rely on miners “voting” for transactions. This means that as soon as some majority of miners approve for a transaction, it is instantly finalized. Therefore, reorgs are not supported after a transaction is finalized. If a reorg happens, the client panics and no longer makes progress.

Ultimately, the rubber hits the road if the correctness assumptions are violated. In PoW, because re-orgs are normal part of life, clients will not panic and will accept really deep reorgs automatically. In fact, in PoW, there’s no way (besides some rough probabilistic calculations) to “be sure” whether the majority of the network is corrupt. The only reason why a client would not accept a deep reorg is if the client implemented a custom rule to shutdown operations in case a reorg deeper than X blocks occurs. In PoS, re-orgs are not a normal part of life, and thus clients already are unable to make progress. In PoS, we can be sure if a re-org has happened because what you previously finalized is no longer final.

Ok, so now back to the question: “What is the potential worst case scenario here?”

We roughly have four answers based on a two by two matrix of “PoW OR PoS” VS. “correct majority OR non-correct majority”.

Case #1: the chain is PoW and the majority of miners are correct. In this case, clients will likely just need to increase the number of confirmation before “finalizing” a block. The 35 block confirmation time of Coinbase for transactions is sufficient if there’s a relatively small faction of miners that try to re-org (less than 20%). If you assume that there will be a good chunk, but not majority, of miners trying to defect, then the confirmations will likely need to be many hours long.
Case #2: the chain is PoS and the majority of miners are correct. In this case, there’s nothing to do here. As soon as you get finality in your transaction, you’re done and there is no reversing.
Case #3: the chain is PoW and the majority of miners are corrupt. In this case, no amount of hours of confirmation are sufficient, but clients can at least implement an anti-reorg mechanism. Basically, if a reorg of more than X blocks is seen (likely at the same height as the depth of the last accepted block policy), then the client simply stops operating. At this point, it’s a matter of social consensus to re-organize the blockchain and filter out the bad miners.
Case #4: the chain is PoS and the majority of miners are corrupt. Same as in PoW chains, but in this case there’s no need to explicitly implement an exit hatch, because it’s automatically implemented based on the way the consensus is designed.

All in all, as long as there’s an exit hatch, then — as always — humans have to come along and rectify the issue but at least funds will be “safe”. Besides the exit hatch, there’s no other protection mechanism that can be implemented. You can disincentivize this behavior by removing rewards for validators or slashing them, but ultimately it doesn’t solve the problem if the majority is corrupt.

Written on July 12, 2021