Fixing Redis-Benchmark Again

Fixing Redis-Benchmark Again

It’s been a productive last few days thinking about caches (theory, more theory). I haven’t written any code though, let’s change that. Today I’m going to make a 𝑑𝑖𝑛𝑦\textit{tiny} change to πš›πšŽπšπš’πšœ-πš‹πšŽπš—πšŒπš‘πš–πšŠπš›πš”\texttt{redis-benchmark} that will allow me to benchmark with Pareto (or Zipf) distributed traffic. Because web traffic is far more likely to be Pareto (see: FIFO Queues Are All You Need, Table 1) than uniformly distributed, testing a cache under this alternative distribution more closely mimics real request patterns. This was a quick experiment and the change (in it’s entirety!) is below.

𝐍.𝐁\textbf{N.B} β€” The πš›πšŽπšπš’πšœ-πš‹πšŽπš—πšŒπš‘πš–πšŠπš›πš”\texttt{redis-benchmark} defaults (1) maximize entropy, (2) produce larger DBs faster and (3) maximize cache evictions. Recall that U(0,r)U\Big(0, r) is the maximum entropy distribution with support on [0,r]\Big[0, r]. Observations (2) and (3) follow from the fact that each key has p(xk)∼1βˆ’eβˆ’d/rp\big(x_k) \sim 1 - e^{-d/r} of occurring in any interval of dd operations. If this was intentional, I can see why they’d do this. If not, path of least resistence, whatever…

𝐍.𝐁\textbf{N.B} β€” Multiple calls to πš™πš˜πš \texttt{pow} are a performance hazard here. Anecdotally, on πš•πš˜πšŒπšŠπš•πš‘πš˜πšœπš\texttt{localhost} I was hitting 120,000 - 130,000 rps with either uniform or pareto distributed traffic. It does make a tiny difference, but it won’t overshadow the main point.

@@ -104,6 +104,7 @@ static struct config {
+    double zipf_shape;
 } config;

@@ -378,11 +379,16 @@ static void randomizeClientKey(client c) { 
     for (i = 0; i < c->randlen; i++) {
         char *p = c->randptr[i]+11;
+
         size_t r = 0;
-        if (config.randomkeys_keyspacelen != 0)
-            r = random() % config.randomkeys_keyspacelen;

+        if (config.randomkeys_keyspacelen != 0) {
+            if (config.zipf_shape != 0) {
+                double z = config.zipf_shape;
+                r = (size_t)floor(
+                    pow(1.0 - drand48() * (1 - 1.0 / pow(config.randomkeys_keyspacelen, z)), -1.0 / z)
+                );
+            } else  {
+                r = random() % config.randomkeys_keyspacelen;
+            }
+        }

@@ -1442,6 +1448,8 @@ int parseOptions(int argc, char **argv) {
+        } else if (!strcmp(argv[i],"-z")) {
+            config.zipf_shape = strtod(argv[++i], NULL);

𝐍.𝐁\textbf{N.B} β€” I refer to 𝚌𝚏𝚐.πš£πš’πš™πš_πšœπš‘πšŠπš™πšŽ\texttt{cfg.zipf_shape} as Ξ±\alpha and 𝚌𝚏𝚐.πš›πšŠπš—πšπš˜πš–πš”πšŽπš’πšœ_πš”πšŽπš’πšœπš™πšŠπšŒπšŽπš•πšŽπš—\texttt{cfg.randomkeys_keyspacelen} as KmaxK_{max} in the text.

The only meaningful change is in πš›πšŠπš—πšπš˜πš–πš’πš£πšŽπ™²πš•πš’πšŽπš—πšπ™ΊπšŽπš’\texttt{randomizeClientKey}, where I replace a random key distributed on U(1,Kmax)U\Big(1, K_{\text{max}}) with one distributed on Pa(Ξ±)Pa\Big(\alpha). I apply the following transformation to u∼U(0,1)u \sim U\Big(0, 1) to produce a Pareto variable with an upper bound at KmaxK_\text{max}.

k=(1βˆ’u(1βˆ’1KmaxΞ±))βˆ’1/Ξ±\begin{equation} k = \left(1 - u\left(1 - \frac{1}{K_{\text{max}}^{\alpha}}\right)\right)^{-1/\alpha} \end{equation}

To test that my changes actually worked I ran a few small tests. Results presented below.

π“πžπ¬π­ 𝟏\textbf{Test 1} β€” I set Redis to AOF mode and fsync’d every 1s. I then wrote a small program to read the AOF and count the frequency of SET operations per key. The leftmost figure shows the results using the original benchmark methodology, the other figure shows the Pareto distributed results generated with the following command:

redis-benchmark -z 0.54 -r 1000 -n 500000 -t SET

Distributions Generated From Uniform & Pareto Benchmarking (Kmax=1k,#Req∼500k)(K_{max}=1k ,\ \#Req \sim 500k)

π“πžπ¬π­ 𝟐\textbf{Test 2} β€” I throttled the benchmarking script to ∼3000rps\sim 3000 \text{rps} of mixed πš‚π™΄πšƒ\texttt{SET} and π™Άπ™΄πšƒ\texttt{GET} operations and then scraped πš”πšŽπš’πšœπš™πšŠπšŒπšŽ_πš‘πš’πšπšœ\texttt{keyspace_hits}, πš”πšŽπš’πšœπš™πšŠπšŒπšŽ_πš–πš’πšœπšœπšŽπšœ\texttt{keyspace_misses}, and πš”πšŽπš’πšœ\texttt{keys} from the target DB. The uniform benchmark produced a DB that grew in size logarithmically and a hit-ratio that mirrored that growth 1:1. The Pareto distributed benchmark (below) produced a much smaller DB and achieved a higher hit-ratio much more quickly.

Uniform Benchmarking (Kmax=100k,#Req∼300k)(K_{max}=100k ,\ \#Req \sim 300k)
Pareto Benchmarking (Kmax=100k,α=0.54,#Req∼300k)(K_{max}=100k ,\ \alpha=0.54 ,\ \#Req \sim 300k)

This addresses just the first of my two gripes with πš›πšŽπšπš’πšœ-πš‹πšŽπš—πšŒπš‘πš–πšŠπš›πš”\texttt{redis-benchmark}. My second gripe is that it doesn’t allow for correlated requests. In production, one can expect that operations on the same key closely follow one another. Given a stream of keys k0,k1,…,knk_0, k_1, \dots, k_n, we should expect p(ki=xj)≀p(ki=xj∣kiβˆ’1=xj)p\Big(k_i = x_j) \leq p\Big(k_i = x_j \mid k_{i-1} = x_j).

This is a petty complaint and beefing up a client too much to do more stuff shouldn’t be a bottleneck that limits the efficacy your benchmarking tool. With that said, here’s how I might β€œfix” πš›πšŽπšπš’πšœ-πš‹πšŽπš—πšŒπš‘πš–πšŠπš›πš”\texttt{redis-benchmark}…

𝐅𝐒𝐠 𝟏.𝟏\textbf{Fig 1.1} β€” The CDF of Truncated Pareto. Following my first change, we’d map a value from u∼U(0,1)u \sim U\Big(0, 1) to [1,∞)\Big[1, \infty) using the inverse CDF. This figure shows the truncated Pareto with 16 keys and highlights the values of uu that correspond to the 3rd key.

𝐅𝐒𝐠. 𝟏\textbf{Fig. 1} β€” Pareto Distributed Keys

Rather than using u∼U(0,1)u \sim U\Big(0, 1), we’ll generate a random walk on (0,1)\Big(0, 1). For any key, the probability of being chosen is no longer constant, it’s now a function of the position of the walk (uiβˆ’1)\Big(u_{i - 1}).

pk=∫kk+1f(x)dxβ†’βˆ«F(k)F(k+1)g(uiβˆ’1,x)dx\begin{equation} p_k = \int_k^{k+1} f\Big(x) \ dx \ \to \int_{F\Big(k)}^{F\Big({k+1})} g\Big(u_{i - 1}, x) \ dx \end{equation}

Where f(x)f\Big(x) is the pdf of the Pareto, and g(uiβˆ’i,x)g\Big(u_{i-i}, x) is the distribution for the ithi^{th} step on the walk. In practice, gg can be any easy-to-calculate random variable.

𝐅𝐒𝐠 𝟏.𝟐\textbf{Fig 1.2} β€” A random walk on (0,1)\Big(0, 1) where each step is Laplace distributed w. g(x)=Lap(uiβˆ’1,0.0625)g\Big(x) = \operatorname{Lap}\Big(u_{i - 1}, 0.0625). Any key is selected with approximately the same frequency as the original method, but selection probability for kk is maximized when the walk is near kk.

𝐅𝐒𝐠 𝟏.πŸ‘\textbf{Fig 1.3} β€” The Laplace pdf. The probability the highlighted key is selected is the probability that the walk steps into the relevant range.

∫F(k)F(k+1)Lap(0.3,0.0625)dx\begin{equation} \int_{F\Big(k)}^{F\Big({k+1})} \operatorname{Lap}\Big(0.3, 0.0625) \ dx \end{equation}

The Laplace is a convenient kernel choice because values are easily generated via. the inverse transform (either directly or difference of two exponential vars).

𝐅𝐒𝐠. 𝟐\textbf{Fig. 2} β€” Laplace Random Walk over a Keyspace

This is an easy change that should just requires about 10 LOC added to πš›πšŽπšπš’πšœ-πš‹πšŽπš—πšŒπš‘πš–πšŠπš›πš”\texttt{redis-benchmark}. The next time I need it I’ll try to squeeze this in and see if it gives the desired results.