This is a brief intro to the Azuma-Hoeffding concentration inequality. It can be viewed as a generalization of the Chernoff bound. For those that are unfamiliar, concentration inequalities specify how concentrated around some mean a particular random variable is. This is useful to quickly demonstrate that a particular random process doesn’t deviate from some expected outcome by more than some bounded amount.

Suppose we have \(N\) balls in a box, some red (exactly \(R = pN\) red balls) and the rest blue (exactly \(B = (1 - p)N\)). We want to draw \(n\) balls from this box, without replacement. Naturally, it is clear that in expectation we should be getting \(np\) red balls in our sample. However, we would like to understand just how tight this expectation is (how close to \(np\) we actually are). This is precisely where a concentration inequality, in this case Azuma-Hoeffding, is useful.

Azuma-Hoeffding: Let \(\{X_{i} : i \in \mathbb{Z}^{+}\}\) be a martingale (as a reminder, a sequence of random variables is a martingale if \(\mathbb{E}(X_i | X_0, X_1, ..., X_{i-1}) = X_{i-1}\). In other words, the expected value of the next random variable in this sequence is just the value of the immediate past random variable). Let \(| X_{i} - X_{i-1}| < c_i\) almost surely. Then, for all positive integers \(n\) and all positive reals \(t\)

\[P(| X_n - X_0 | \geq t) \leq 2e^{(-t^2)/(2\sum_{i = 1}^{n}{c_i}^2)}\]

For this writeup, now we will only apply the result to the example discussed at the beginning, without going into proofs. Going back to the example, let \(\{X_1, \dots, X_n\}\) represent the balls we draw during our \(n\) samples. We represent a red sample as \(X_i = 1\) and blue as \(X_i = 0\). Let \(S_n = \sum_{i = 1}^{n} X_i\) represent the total number of red balls after \(n\) samples. We already know that \(\mathbb{E}[S_n] = np\). \(S_i\) is already a special type of martingale, called a sub-martingale, wherein \(\mathbb{E}(S_i | S_1, \dots, S_{i-1}) \geq S_{i-1}\). But if we do a bit more work, we can get the martingale differential sequence, which will be easier to work with. Let \(R' = pN - S_k\), i.e. the number of red balls left in the box after sampling \(k\) times. Then, the expected number of red balls in the future is equal to the leftover number of samples, times the left-over number of red balls, divided by total number of balls, i.e. \((n-k)(Np - S_k)/(N-k)\). Let’s define the random variable \(F_k\), for \(k = \{0, ..., n\}\) as

\[F_k = S_k + (n - k)\frac{Np - S_k}{N-k}.\]

Note that \(F_n = S_n\) and \(F_0 = np\) (check yourself by plugging in above). Now we can get the martingale differential sequence, as stated before. In this sequence, we have the the differences of the upper bounds at round \(i\) are:

\[b_i - a_i = \frac{N - n}{N - i} \leq 1.\]

We can now trivially apply the inequality:

\[P(S_n - np \geq t) \leq e^{(-2t^2)/(\sum_{i = 1}^{n}\frac{N-n}{N-i})} \leq e^{-2t^2 / n}\]

In other words, our sample will be quite concentrated around the mean.


Written on March 7, 2019