Note on Histograms

Note on Histograms

𝐍𝐨𝐭𝐞\textbf{Note} β€” I wrote a series of posts in August 2024 about histograms. It’s a year later, and I felt compelled to condense them today. On second thought, this work is entirely over-shadowed by my post from September 2024 about Modeling Systems With Generating Functions. The only thing relevant from August’s posts is the following fact:

𝐂π₯𝐚𝐒𝐦\textbf{Claim} β€” There exists a moment bound tighter than the Chernoff bound.

Pr[Snβ‰₯Ξ»]≀mink>0𝔼[Xk]Ξ»βˆ’k≀inft>0𝔼[etX]eβˆ’tΞ»\begin{equation} \Pr[S_n \geq \lambda] \leq \text{min}\limits_{k \gt 0} \ \mathbb{E}[X^k]\lambda^{-k} \leq \text{inf}\limits_{t \gt 0} \ \mathbb{E}[e^{tX}]e^{-t\lambda} \end{equation}

However, this result only shows that such a bound exists, not that it exists for a small kk. In practice, computing and storing βˆ‘xik\sum x_i^k is both computationally expensive and likely to overflow. There are a few approaches we can take to mitigate that risk.

  1. Store Log of Observations β€” π™Ύπš‹πšœπšŽπš›πšŸπšŽ\texttt{Observe} can store the moments of ⌊log2(x)βŒ‹\lfloor \log_2\big(x) \rfloor instead of xx. This is still a valid bound because log2(x)\log_2\big(x) is non-decreasing.

  2. Use Online Mean β€” π™Ύπš‹πšœπšŽπš›πšŸπšŽ\texttt{Observe} can maintain online means of bkb^k rather storing βˆ‘xk\sum x^k.

// Observe - implemented w. the two strategies mentioned above...
func (mh *momentsHistogram) Observe(o uint64) {
    mh.tot++
    var b = bits.Len64(o)&0x0000003f // bits.Len64(o) == ⌊log2(o)βŒ‹
    var c = (mh.tot - 1) / mh.tot
    for m := 0; m < et.n; m++ { 
        // calculate running avg. - divide and then add, cannot keep full sum 
        // (even for log-moments) w.o. overflow coming quickly...
        mh.mmts[m] *= c
        mh.mmts[m] += (math.Pow(b, float64(m)) / et.tot)
    }
}