A Method for Fast Almost-Normal RVs

A Method for Fast Almost-Normal RVs

In many settings where normally distributed values are called for, “approximately normal” values will suffice (e.g. as in Database-Friendly Random Projections [1]). Today, I’ll describe a method that’s (a little) faster than the 𝚖𝚊𝚝𝚑/𝚛𝚊𝚗𝚍\texttt{math/rand} and 𝚐𝚘𝚗𝚞𝚖/𝚍𝚒𝚜𝚝𝚞𝚟\texttt{gonum/distuv} implementations and approximates the standard normal distribution.

Our goal is to use one call of 𝚛.𝚄𝚒𝚗𝚝𝟼𝟺()\texttt{r.Uint64()} to generate values that are distributed by Bin(32,1/2)+U(0,1)\operatorname{Bin}\left(32, 1/2\right) + U\left(0, 1 \right). The number of set bits in the first 𝚞𝚒𝚗𝚝𝟹𝟸\texttt{uint32} is distributed by Bin(32,1/2)\operatorname{Bin}\left(32, 1/2\right) and the remaining bits are used to generate 231U(0,231)U(0,1)2^{-31} \cdot U\left(0, 2^{31}\right) \approx U(0, 1). We can then center and scale the intermediate distribution and return a value with zero mean and unit variance.

const c0 = 0.35172622 // 1.0 / math.Sqrt(8. + 1./12.)
const c1 = 1.0 / float64(1<<32)

// Draws from an Approx Normal distribution, if you're creative it resembles a terraced garden.
func ApproxNormF64(r *rand.Rand) float64 {
    u := r.Uint64()
    uhi, ulo := uint32(u>>32), uint32(u)
    return c0 * (float64(bits.OnesCount32(uhi)) + c1*float64(ulo) - 16.5)
}

Unfortunately, we get an underwhelming (24%)\big(24\%) speedup against 𝚖𝚊𝚝𝚑/𝚛𝚊𝚗𝚍\texttt{math/rand}. To go faster, we could use all 64 bits to generate variables distributed by Bin(64,1/2)\operatorname{Bin}\left(64, 1/2\right), but I found this was only 44%44\% faster than 𝚖𝚊𝚝𝚑/𝚛𝚊𝚗𝚍\texttt{math/rand}’s implementation and much further from 𝒩(0,1)\mathcal{N}\big(0, 1) than 𝙰𝚙𝚙𝚛𝚘𝚡𝙽𝚘𝚛𝚖𝙵𝟼𝟺\texttt{ApproxNormF64}.

go test -bench GenRandNorm -benchtime=10s -cpu=1
Benchmark_Distuv_GenRandNorm           1000000000               8.166 ns/op
Benchmark_MathRand_GenRandNorm         1000000000               4.717 ns/op
Benchmark_Approx_GenRandNorm           1000000000               3.600 ns/op
Benchmark_Bin64_Approx_GenRandNorm     1000000000               2.655 ns/op

We now turn to the important question of “Is this actually any good?”. Using f𝒩(0,1)f \sim \mathcal{N}(0, 1) and gg to denote the generated distribution. Notice that:

Taking these observations together, we can bound fg\left\lVert f - g \right\rVert_\infty by computing the maximum possible height of a triangle formed between ff and gg.

fg1218+1/1212πe0.04255\begin{equation} \left\lVert f - g \right\rVert_\infty \leq \frac{1}{2}\cdot \frac{1}{\sqrt{8 + 1/12}} \cdot\frac{1}{\sqrt{2\pi e}} \approx 0.04255 \end{equation}

We can also bound the maximum sampling error over a contiguous interval by finding the max area for such a triangle. For any a,ba, b \in \mathbb{R}, we have:

abf(x)g(x)dx14(fg8+1/12)1256.\begin{equation} \int_a^b \mid \ f(x) - g(x) \mid \, dx \leq \frac{1}{4}\left(\ \frac{\left\lVert f - g \right\rVert_\infty}{\sqrt{8\ + 1/12}}\right) \leq \frac{1}{256}. \end{equation}

𝐍.𝐁\textbf{N.B} — This implementation described on Reddit is identical to what I propose here. In fact, the post heped me catch an error in my work where I neglected the 1/121/12 contribution to variance from U(0,1)U\big(0, 1). My sincerest thanks to u/Dusty_Coder and u/skeeto.

𝐁𝐨𝐧𝐮𝐬 𝐂𝐨𝐧𝐭𝐞𝐧𝐭\textbf{Bonus Content} — Another zany (slow!) way to generate approximately normal random variables is to generate nn uniform random variables, sort them, and apply the following transform to the largest. I can’t imagine why you’d ever use want to this method, but it’s there if you want it…

g(x)=2/πtanh1(2xn1)\begin{equation} g(x) = 2/\sqrt{\pi} \text{tanh}^{-1}\left(2x^{n}-1\right) \end{equation}

This method just approximates the CDF of the normal to F12tanh(πx2)+12F \ \sim \ \frac{1}{2}\tanh\left(\frac{\sqrt{\pi}x}{2}\right)+\frac{1}{2} and makes use of properties of order statistics of the uniform distribution. The transform above is just the approximate quantile function of the normal.

go test -cpu=1 -bench=Transform_GenRanNorm -benchtime=30s
Benchmark_Transform_GenRanNormMaxTanh     1000000000          26.53 ns/op