A Method for Encoding Permutations

A Method for Encoding Permutations

This is a quick one. Let’s imagine that you wanted to store a permutation of the first nn elements of β„•0\mathbb{N}_0 on disk. Can we store it in fewer than nβ‹…log2(n)n \cdot \operatorname{log_2}\Big(n) bits?

𝐍.𝐁\textbf{N.B} β€” As is often the case, someone smarter than me came up with a solution at least 50 years ago. Being generous, I was beaten by D.H Lehmer in π‘‡π‘’π‘Žπ‘β„Žπ‘–π‘›π‘” πΆπ‘œπ‘šπ‘π‘–π‘›π‘Žπ‘‘π‘œπ‘Ÿπ‘–π‘π‘  π‘‡π‘Ÿπ‘–π‘π‘˜π‘  π‘‘π‘œ π‘Ž πΆπ‘œπ‘šπ‘π‘’π‘‘π‘’π‘Ÿ\textit{Teaching Combinatorics Tricks to a Computer}. Being more realistic, Charles-Ange Laisant beat me to this observation in π‘†π‘’π‘Ÿ π‘™π‘Ž π‘π‘’π‘šΓ©π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘› πΉπ‘Žπ‘π‘‘π‘œπ‘Ÿπ‘–π‘’π‘™π‘™π‘’, π΄π‘π‘π‘™π‘–π‘π‘Žπ‘‘π‘–π‘œπ‘› π‘Žπ‘’π‘₯ π‘ƒπ‘’π‘Ÿπ‘šπ‘’π‘‘π‘Žπ‘‘π‘–π‘œπ‘›π‘ \textit{Sur la NumΓ©ration Factorielle, Application aux Permutations} in the 1880s.

Because there are n!n! permutations of nn, we can encode any permutation into a single log2(n!)\operatorname{log_2}\Big(n!) bit value. Using n=3n=3 as an example, Enc([0,1,2])=0\operatorname{Enc}\Big([0, 1, 2]) = 0 and Enc([2,1,0])=3!βˆ’1=5\operatorname{Enc}\Big([2, 1, 0]) = 3! - 1 = 5. Generally, for any permutation of [n][n], we can encode it to it’s lexicographical rank among all permutations of nn items. That said, is this any good? We can approximate how many bits this scheme uses by applying Stirling’s Approximation.

log2(n!)β‰ˆlog2(2Ο€nnneβˆ’n)=nlog2(n)+log2(2Ο€n)βˆ’nlog2(e)\begin{equation} \operatorname{log_2}\left(n!\right) \approx \operatorname{log_2}\left(\sqrt{2\pi n}n^{n}e^{-n}\right) = n\operatorname{log_2}\left(n\right) + \operatorname{log_2}\left(\sqrt{2\pi n}\right) - n\operatorname{log_2}\left(e\right) \end{equation}

After doing the algebra, you’ll notice that the savings are quite poor asymptotically compared to a baseline of nβ‹…log2(n)n \cdot \operatorname{log_2}\Big(n) bits. However, when nn is small we can get an OK improvement over the naive representation. For example, when n≀20n \leq20, we can encode any permutation into a single πšžπš’πš—πšπŸΌπŸΊ\texttt{uint64}. Compare that with the nn bytes required to store the permutation in []πšžπš’πš—πšπŸΎ\texttt{[]uint8}.

Finally, a short example of a potential application of this method. In the code below, πšπšŠπš—πšπš†πš’πšπš‘π™½π™±πš’πšπšœπš‚πšŽπš\texttt{RandWithNBitsSet} uniformly samples from the subset of πšžπš’πš—πšπŸΉπŸΈ\texttt{uint32} with with exactly nn bits set. It does this without trial-and-error or by drawing a uniform value on [0,32!)[0, 32!) and then extracting the first nn positions of the corresponding permutation to find the nn bits to set.

package main

import (
    "math/big"
    "math/rand"
    "slices"
)

var fact32 = BigFactorial(32) // 32!

func RandWithNBitsSet(rng *rand.Rand, n uint8) *big.Int {
    mask := big.NewInt(0).Lsh(big.NewInt(1), 118)
    mask.Sub(mask, big.NewInt(1))              // 2^118 - 1
    u := big.NewInt(0).Lsh(big.NewInt(1), 120) // larger than fact32
    for u.Cmp(fact32) != -1 {
        u = u.SetUint64(rng.Uint64())
        u.Lsh(u, 64)
        u.Add(u, big.NewInt(0).SetUint64(rng.Uint64()))
        u.And(u, mask)
    }
    arr := DecodeBig(u, n)
    u.SetUint64(0)
    for i := 0; i < int(n); i++ {
        u.Add(u, big.NewInt(1<<arr[i]))
    }
    return u
}

func DecodeBig(v *big.Int, n uint8) []uint8 {
    remain, decoded := make([]uint8, 32), make([]uint8, n)
    c, fact := big.NewInt(0), BigFactorial(32)

    for i := uint8(1); i < 32; i++ {
        remain[i] = i
    }

    for i := uint8(0); i < n; i++ {
        fact.Div(fact, big.NewInt(int64(32-i-1)))
        c.Div(v, fact)
        ic := int(c.Int64())
        decoded[i] = remain[ic]
        slices.Delete(remain, ic, ic+1)
        v.Sub(v, c.Mul(c, fact))
    }
    return decoded
}

func BigFactorial(n uint8) *big.Int {
    res := big.NewInt(1)
    for i := uint8(2); i <= n; i++ {
        res.Mul(res, big.NewInt(int64(i)))
    }
    return res
}