Note About a Guy Who Couldn’t Find a Tail Bound

I read a paper earlier today, How I Wasted Too Long Finding a Concentration Inequality for Sums of Geometric Variables. It’s exactly what the title suggests, It’s a story about a researcher who struggled to find a (maybe easy?) bound. While my posts often obscure the fact that I routinely spend an afternoon trying to bound a thing, I liked that the author laid out a very reasonable working process. I’d like to highlight a few sections.

Notice that the filename of the linked paper! Nice Easter Egg. As an aside, the author, Dan Brown has a really interesting set of publications (Yes, the Human Genome paper, but many more!).

To be more proper, these summands aren’t Lipschitz. You might imagine, “what if I bound it by some really big bound beyond which the probability is really tiny?” But that doesn’t seem to work well: geometric distributions are pretty pernicious, and all of the methods I know for that don’t work in this domain.

About two weeks ago when I was trying to bound the number of levels in a skiplist I did exactly this. I didn’t quite have i.i.d geometric RVs, but the same urge was there. Amazing coincidence!

Then I looked at “negative binomial distribution” in Wikipedia, and it of course has a link to the cumulative distribution function for negative binomial variables: it’s the regularized incomplete $\beta$ function. Oh, dear. That’s scary. It has calculus in it. I hate calculus.

I try to go beyond the Wikipedia Related Distributions section, but it’s a very useful resource and it often pulls through! Other options I’ve always been fond of are the Casella and Berger Diagram and the American Statistician Diagram.

$\textbf{Theorem 1}$ . Let $Y\Big(n, p)$ be a negative binomially distributed random variable that arises as the sum of $n$ iid geometrically distributed random variables with expectation $p$ . Then:

$\begin{equation} E\Big[Y\Big(n, p)] = np ,\ \operatorname{Pr}\Big[Y\Big(n, p) \gt knp] \leq e^{\frac{−kn(1−1/k)^2}{2}} \end{equation}$

I have no leg to stand on re: research advice. You should not be taking advice from me, I’ve never published a thing…

A few final thoughts:

There is a legitimate lemma here. Intermediate lemmas don’t emerge from nowhere. Much of CS literature assumes you can just read the theorem and reason backwards from there. Go slower! someone put thought into that lemma!
The author isn’t a nobody, professionals can get tripped up on “simple” problems, research takes sustained and deliberate effort.
This might not be such a simple problem. As of writing, this paper has over two dozen citations! I imagine no more than half are because of the title.