A Method for Fast Almost-Normal RVs

In many settings where normally distributed values are called for, “approximately normal” values will suffice (e.g. as in Database-Friendly Random Projections [1]). Today, I’ll describe a method that’s (a little) faster than the $\texttt{math/rand}$ and $\texttt{gonum/distuv}$ implementations and approximates the standard normal distribution.

Our goal is to use one call of $\texttt{r.Uint64()}$ to generate values that are distributed by $\operatorname{Bin}\left(32, 1/2\right) + U\left(0, 1 \right)$ . The number of set bits in the first $\texttt{uint32}$ is distributed by $\operatorname{Bin}\left(32, 1/2\right)$ and the remaining bits are used to generate $2^{-31} \cdot U\left(0, 2^{31}\right) \approx U(0, 1)$ . We can then center and scale the intermediate distribution and return a value with zero mean and unit variance.

const c0 = 0.35172622 // 1.0 / math.Sqrt(8. + 1./12.)
const c1 = 1.0 / float64(1<<32)

// Draws from an Approx Normal distribution, if you're creative it resembles a terraced garden.
func ApproxNormF64(r *rand.Rand) float64 {
    u := r.Uint64()
    uhi, ulo := uint32(u>>32), uint32(u)
    return c0 * (float64(bits.OnesCount32(uhi)) + c1*float64(ulo) - 16.5)
}

Unfortunately, we get an underwhelming $\big(24\%)$ speedup against $\texttt{math/rand}$ . To go faster, we could use all 64 bits to generate variables distributed by $\operatorname{Bin}\left(64, 1/2\right)$ , but I found this was only $44\%$ faster than $\texttt{math/rand}$ ’s implementation and much further from $\mathcal{N}\big(0, 1)$ than $\texttt{ApproxNormF64}$ .

go test -bench GenRandNorm -benchtime=10s -cpu=1
Benchmark_Distuv_GenRandNorm           1000000000               8.166 ns/op
Benchmark_MathRand_GenRandNorm         1000000000               4.717 ns/op
Benchmark_Approx_GenRandNorm           1000000000               3.600 ns/op
Benchmark_Bin64_Approx_GenRandNorm     1000000000               2.655 ns/op

We now turn to the important question of “Is this actually any good?”. Using $f \sim \mathcal{N}(0, 1)$ and $g$ to denote the generated distribution. Notice that:

Midpoints of adjacent “terraces” of $g$ are spaced $1/\sqrt{8 + 1/12}$ apart.
The maximum derivative of the pdf of $\mathcal{N}\left(0, 1\right)$ is $1/\sqrt{2\pi e} \leq 1/4$ .
For any terrace midpoint, $m_i$ , we have $f\big(m_i) \approx g\big(m_i)$ .

Taking these observations together, we can bound $\left\lVert f - g \right\rVert_\infty$ by computing the maximum possible height of a triangle formed between $f$ and $g$ .

$\begin{equation} \left\lVert f - g \right\rVert_\infty \leq \frac{1}{2}\cdot \frac{1}{\sqrt{8 + 1/12}} \cdot\frac{1}{\sqrt{2\pi e}} \approx 0.04255 \end{equation}$

We can also bound the maximum sampling error over a contiguous interval by finding the max area for such a triangle. For any $a, b \in \mathbb{R}$ , we have:

$\begin{equation} \int_a^b \mid \ f(x) - g(x) \mid \, dx \leq \frac{1}{4}\left(\ \frac{\left\lVert f - g \right\rVert_\infty}{\sqrt{8\ + 1/12}}\right) \leq \frac{1}{256}. \end{equation}$

$\textbf{N.B}$ — This implementation described on Reddit is identical to what I propose here. In fact, the post heped me catch an error in my work where I neglected the $1/12$ contribution to variance from $U\big(0, 1)$ . My sincerest thanks to u/Dusty_Coder and u/skeeto.

$\textbf{Bonus Content}$ — Another zany (slow!) way to generate approximately normal random variables is to generate $n$ uniform random variables, sort them, and apply the following transform to the largest. I can’t imagine why you’d ever use want to this method, but it’s there if you want it…

$\begin{equation} g(x) = 2/\sqrt{\pi} \text{tanh}^{-1}\left(2x^{n}-1\right) \end{equation}$

This method just approximates the CDF of the normal to $F \ \sim \ \frac{1}{2}\tanh\left(\frac{\sqrt{\pi}x}{2}\right)+\frac{1}{2}$ and makes use of properties of order statistics of the uniform distribution. The transform above is just the approximate quantile function of the normal.
go test -cpu=1 -bench=Transform_GenRanNorm -benchtime=30s
Benchmark_Transform_GenRanNormMaxTanh     1000000000          26.53 ns/op