Note On Comparing Incomparable Performance Changes

Earlier this week a colleague told me that after he made $\texttt{\$CHANGE}$ , the p50 of $\texttt{\$FUNCTION}$ was down but the p99 was unchanged. He concluded that although the tail was unaffected, this was a successful change because the median better determines how much work we can actually do. I questioned this claim, and we spent a few minutes hand-waving before declaring it a win.

$\textbf{N.B}$ — To avoid muddying the waters, we’ll ignore e.g. resource usage, code complexity, and the dozen other metrics that you might reasonably use to evaluate a code change.

Let’s think this through. Consider two non-negative probability distributions $f_0$ and $f_1$ . We say that $f_0 \prec f_1$ if $F_0^{-1}\big(p) \geq F_1^{-1}(p)$ for all $p \in [0, 1]$ . If neither $f_0 \prec f_1$ or $f_1 \prec f_0$ , then we say that the distributions are incomparable. Applying this terminology to software, we say a change to a function is “incomparable” if it neither increases nor decreases the execution time across the entire distribution. However, bucketing all changes into three classes is probably too coarse. So what do we make of changes like this? First, it is a good change and it is a win, but I had two additional thoughts:

If we made up a metric that takes into account all percentiles equally, we’d arrive at something similar to $E\big[f_0 - f_1]$ . If we wanted to impose our own preferences, we might integrate with respect to some importance measure, $\omega\big(x)$ , but this is likely overkill. This result shouldn’t be so surprising, much of queuing theory (by Little’s Law) can be boiled down to describing the mean service/arrival times and their coefficients of variation.

$\begin{equation} \int_{0}^{1} F_{0}^{-1}\left(p\right) \ d \omega(p) - \int_{0}^{1} F_{1}^{-1}\left(p\right) \ d \omega(p) \end{equation}$

For a variety of practical reasons (see: The Tail At Scale), we tend to care more about the behavior at the tail. Provided that means are roughly equal, we need to consider that real systems have retries, timeouts, etc. We can improve the health of the system by avoiding these mechanisms. Given a timeout $t$ , we do well to minimize retry percentage, i.e. minimizing $1 - F\big(t)$ .