Topics on Caching: Summary & Final Comments

The last few weeks I’ve been writing about caching algorithms quite a bit. I’m going to use this note as a summary of the last few weeks’ work.

We want to determine if a key, $x_k$ , that has just entered a cache is likely to be accessed before being evicted. In other words, given a cache with size $c$ and a keyspace $\mathcal{X}$ with size $d$ , what is the probability $x_k$ reappears before at least $d - c$ other keys? If this probability is sufficiently small, let’s preemptively mark the key for deletion for when the cache faces memory pressure.
We can analyze streams/datasets of arbitrary length for the number of one-hit keys in the stream, but a more interesting problem arises when we focus on a cache of a given size.

$\textbf{N.B}$ — I explain why this is the case in this note. Think of $x_k$ as being on a random, exponential “timer” which fires (on average) every $\lambda_k$ periods.

Alas, we cannot see into the future, so we must answer this probabilistically. Let’s start by considering just a single pairwise match-up. Given keys $x_k, x_j \in \mathcal{X}$ with corresponding appearance probabilities $\lambda_k, \lambda_j$ , $x_k$ is accessed before $x_j$ with probability $\lambda_k / \big(\lambda_k + \lambda_j)$ .
We can integrate over $\mathcal{X}$ to get the expectation of the number of unique keys (call this random variable $U_k$ ) which will arrive before $x_k$ .

$\begin{equation} E\Big[U_k] = \frac{1}{\Big|\mathcal{X}|} \int_\mathcal{X} f\big(\lambda) \frac{\lambda}{\lambda_k + \lambda} \ d\lambda \qquad E\Big[U_k^2] = \frac{1}{\Big|\mathcal{X}|} \int_\mathcal{X} f\big(\lambda) \left(\frac{\lambda}{\lambda_k + \lambda}\right)^2 \ d\lambda; \end{equation}$

With both the first and second moments, we can use a Paley-Zygmund style inequality to bound the minimum size of the right tail. This is a lower bound on the probability that $x_k$ is next accessed after at least $c$ other keys. An upper bound on the probability $x_k$ survives the cache is given by $1 - p_{ohw}\Big(x_k)$ .

$\begin{equation} p_{ohw}\Big(x_k) = P\Big(U_k >\theta E\Big[U_k]) \geq \Big(1-\theta)^2 \frac{E\Big[U_k]^2}{E\Big[U_k^2]} \end{equation}$

In this note I sketch out these assumptions that allow us to go from linear $\textit{w.r.t}$ the size of $\mathcal{X}$ to constant.

In practice, to “integrate over $\mathcal{X}$ ”, we must store $\lambda_j$ for each $x_j \in \mathcal{X}$ . This introduces a significant memory overhead and requires iterating! over the keyspace before each insertion. This is not computationally viable.
If we assume $\lambda$ is constant on all keys excluding $x_k$ , we can (1) calculate this in constant time $\textit{w.r.t}$ the size of $\mathcal{X}$ and (2) provide an estimate that maximizes $p_{ohw}\big(x_k)$ .
To get approximate counts of all $x_k$ over a large keyspace we can implement a CountMinSketch alongside our probability oracle.

I didn’t have to do much work to get a wrapper around the Redis protocol, $\texttt{tidwall/redcon}$ does all the work for me. Though the redcon layer brings my machine from $\sim 125,000$ to $\sim 86,000$ rps.

I implemented this in Go and got quite nice performance. Placing a middleware between the client and $\texttt{redis-server}$ has a significant performance penalty, however this penalty isn’t exacerbated when checking for $p_{ohw}\big(x_k)$ vs. just acting as a proxy.

====== SET ======
# with p_ohw enabled ::: throughput summary: 85778.01 requests per second
1000000 requests completed in 11.66 seconds
latency summary (msec):
avg       min       p50       p95       p99       max
0.302     0.024     0.303     0.431     0.559     1.735

# without p_ohw enabled ::: throughput summary: 85645.77 requests per second
1000000 requests completed in 11.68 seconds
latency summary (msec):
avg       min       p50       p95       p99       max
0.302     0.024     0.303     0.431     0.567     1.239

I’m not going to pursue this much further. In practice, the desired result (minimize memory pressure to keep rps high, avoid eviction system from running) really only occurs at $\gt 100,000 \ \textit{rps}$ . Furthermore, thinking through this idea was a lot more rewarding than trying to obsess over how we can tune the parameters for any given cache.