More on Caches: A Lower Bound For One-Hit Wonder Probability

More on Caches: A Lower Bound For One-Hit Wonder Probability

In a previous post I estimated an upper bound on one-hit wonders given a set of keys and a stream of requests. In that post I expressed a desire for the distribution of #\#requests required to see cc unique keys. After some research, I found that this problem is a more challenging variant of the classic coupon-collector problem. Today, I will solve two simpler and more practical problems that help us reach a similar conclusion.


  1. Given a key (xkx_k) with appearance probability p(xk)p\big(x_k), how many unique keys (call this UkU_k) will likely be seen before xkx_k? This is of particular interest when using LRU eviction. If this value is much larger than cc we’ll expect to get little benefit from caching the corresponding key.

We’ll assume a set (KK) of nn keys x1,x2,xn1,xnx_1, x_2 \dots ,x_{n-1}, x_n with respective probabilities p(x1),p(xn)p\big(x_1) \dots ,p\big(x_n) and i=1np(xi)=1\sum_{i=1}^{n} p\big(x_i) = 1.

Instead of thinking of a stream of requests with discrete arrival indices, we can re-frame this problem in terms of the the next arrival of key xkx_k in continuous time. Think of a key’s next arrival index as exponentially distributed with parameter λk=p(xk)\lambda_k = p\big(x_k). Helpfully, restating the problem in this way preserves the mean time until a key is seen because E[exp(λ)]=1/λE[\operatorname{exp}\big(\lambda)]=1/\lambda. For any pair of keys (xk,xj)\big(x_k, x_j), the probability that xkx_k is seen before xjx_j can be calculated as follows:

There is likely some small bias here, we may instead want the ceilceil of the next arrival index. Nevertheless…

w(λk,λj)=λkλj0yeλkxeλjydxdy=λkλj+λk\begin{equation} w\big(\lambda_k, \lambda_j) = \lambda_{k}\lambda_{j}\int_{0}^{\infty}\int_{y}^{\infty}e^{-\lambda_{k}x}\ e^{-\lambda_{j}y}\ dx\ dy = \frac{\lambda_k}{\lambda_j + \lambda_k} \end{equation}

Thus, the expectation of the number unique of keys that will arrive before xkx_k is the sum of every other key’s “win probability” against xkx_k. If we can estimate a square-integrable probability distribution (we’ll call this f:K(0,1)f: K \to \Big(0, 1)) that approximates p(x1),p(xn)p\big(x_1) \dots ,p\big(x_n), we can calculate E[Uk]E\Big[U_k] as follows:

Using the uniform distribution we have w(λi,λj)=1/2w\Big(\lambda_i, \lambda_j) = 1/2. This gives us n01w(f(x),1)dx=0.5nn\int_{0}^{1}w\Big(f\Big(x),1)\ dx = 0.5n. I should simulate this to verify this with a few other distributions.

E[Uk]=ikKw(λi,λk)0w(f(x),f(xk))dx\begin{equation} E\Big[U_k] = \sum_{i \neq k}^{K} w\big(\lambda_i, \lambda_k) \approx \int_{0}^{\infty} w\big(f\big(x), f\big(x_k))\ dx \end{equation}

We can go through the exercise of plugging in the standard uniform distribution to find that if p(xi)=p(xj)p\big(x_i) = p\big(x_j) for all xi,xjKx_i, x_j \in K then E[Uk]=n/2E\Big[U_k] = n/2. A toss-up! Excellent, this is what we’d expect here.


  1. Given a key with appearance probability p(xk)p\big(x_k), what is the probability that it appears within the next cc unique items. In other words, the probability it reappears before being evicted from an LRU cache?

This is just 1Pr(Ukc)1 - Pr\Big(U_k \geq c). We have many inequalities that can help us with this! We want to find the probability that UkU_k is small, so I’ll use a Paley-Zygmund bound. For Uk0,θ[0,1]U_k \geq 0, \theta \in \big[0, 1], Paley-Zygmund is as follows:

𝐍.𝐁\textbf{N.B}: This assumes E[Uk]>cE\Big[U_k] \gt c. In practice almost all keys will meet this criteria. For those that don’t we may need to fall-back to a another bound. I have not figured that out yet…

Ppz(xk)=P(Uk>θE[Uk])(1θ)2E[Uk]2E[Uk2]\begin{equation} P_{pz}\Big(x_k) = P\Big(U_k >\theta E\Big[U_k]) \geq \Big(1-\theta)^2 \frac{E\Big[U_k]^2}{E\Big[U_k^2]} \end{equation}

We already have E[Uk]E\Big[U_k] so we can simply set θ=c/E[Uk]\theta = c/E\Big[U_k]. Now we just need to calculate the second raw moment of UkU_k (the denominator in the RHS). This is similar to the calculation for E[Uk]E\Big[U_k] in (1):

𝐄𝐱.\textbf{Ex.}: Given uniform load across all keys and a cache that’s 10% of a keyspace we have c=0.1c = 0.1 and Uk=0.5U_k = 0.5 for all kk. Thus Ppz=(1c/E[Uk])2E[Uk]2/E[Uk2]=P_{pz} = \Big(1 - c/E\Big[U_k])^{2} E\Big[U_k]^{2}/E\Big[U_k^2]= (10.1/0.5)2\Big(1 - 0.1/0.5)^{2}. This gives 0.360.36 as an upper bound on re-appearing before eviction.

E[Ukn]=0w(f(x),f(xk))ndx\begin{equation} E[U_k^{n}] = \int_{0}^{\infty} w\big(f\big(x), f\big(x_k))^{n}\ dx \end{equation}

As desired, Paley-Zygmund gives a lower bound on the probability that UkU_k is greater than cc. Alternatively, we can take 1Ppz1 - P_{pz} to get an upper bound on the probability that UkU_k is less than cc (i.e. the key survives the LRU cache).


I 𝐝𝐢𝐝 𝐧𝐨𝐭\textbf{did not} settle the score here, I don’t anticipate revisiting this for any practical purpose, but may try to work it out for ego’s sake…

Redemption Idea — If a key is seen jj times it survive the LRU at most j/2\lceil j/2 \rceil times. I think last post’s sum term should instead be: j=1n(cn)j2p(j)\sum_{j=1}^{n}\big(\frac{c}{n})^{\lceil \frac{j}{2} \rceil}p\big(j).

  1. In my previous post I estimated that given a uniform distribution over KK, a key wouldn’t be an OHWOHW across any or all of its appearances with probability 1e1nj=1m(cn)jf(j)1\ -\ e^{-1}\ - n\sum_{j=1}^{m}\left(\frac{c}{n}\right)^{j}f\big(j). Can we refine this for an arbitrary distribution?

This section isn’t as useful as (1) or (2) but I have a score to settle. Given an arbitrary distribution over KK we can make the following replacements:

I thought about my original estimate a bit more and realized that requiring xkx_k to reappear within the next cc unique keys (“surviving the LRU”) for each of jj occurrences was too restrictive. We actually just need each new insertion of xkx_k to survive at least once. I suspect checking the odd values of jj will suffice, but I’m going to take my 𝐋\textbf{L} here and move on…


So what is this good for? I don’t think it makes sense to use it as an alternative to LFU or LRU, it’s just a more complicated thing to sort on. I do think there’s some value in using PpzP_{pz} as gatekeeper to keep pressure off an LRU cache (e.g used to sort into a primary and a fast-eviction queue?). Doing this would require the following:

I also might fork 𝚝𝚒𝚍𝚠𝚊𝚕𝚕/𝚛𝚎𝚍𝚌𝚘𝚗\texttt{tidwall/redcon} just to wrap in-memory SQLite. Like I said, big week for caches.

We shall soon see if we can overcome these challenges (I have no clue, btw)! Next week I’ll fork 𝚝𝚒𝚍𝚠𝚊𝚕𝚕/𝚛𝚎𝚍𝚌𝚘𝚗\texttt{tidwall/redcon} (github) and see where this leads…

𝐍𝐨𝐭𝐞\textbf{Note} — I wrote my last post on 03.13.2024 and this one on 03.23.2024. Oddly enough, this past week has been huge for caches since Redis decided to do their heel-turn. I’m not in the business of replacing Redis, but I am strongly considering writing this into some sort of toy system. Liking where this is headed…

Related Reading:

Also note, the more challenging variant of this problem boils down to determining the time of the cthc^{th} order statistic over Exp(p(xi))\operatorname{Exp}\big(p\big(x_i)) for all xiKx_i \in K and then finding the probability that Exp(p(xk))\operatorname{Exp}\big(p\big(x_k)) “fires” before the cthc^{th} observation. I’m too cowardly to work with the Hypoexponential.