Note On Comparing Incomparable Performance Changes

Note On Comparing Incomparable Performance Changes

Earlier this week a colleague told me that after he made $𝙲𝙷𝙰𝙽𝙶𝙴\texttt{\$CHANGE}, the p50 of $𝙵𝚄𝙽𝙲𝚃𝙸𝙾𝙽\texttt{\$FUNCTION} was down but the p99 was unchanged. He concluded that although the tail was unaffected, this was a successful change because the median better determines how much work we can actually do. I questioned this claim, and we spent a few minutes hand-waving before declaring it a win.

𝐍.𝐁\textbf{N.B} — To avoid muddying the waters, we’ll ignore e.g. resource usage, code complexity, and the dozen other metrics that you might reasonably use to evaluate a code change.

Let’s think this through. Consider two non-negative probability distributions f0f_0 and f1f_1. We say that f0f1f_0 \prec f_1 if F01(p)F11(p)F_0^{-1}\big(p) \geq F_1^{-1}(p) for all p[0,1]p \in [0, 1]. If neither f0f1f_0 \prec f_1 or f1f0f_1 \prec f_0, then we say that the distributions are incomparable. Applying this terminology to software, we say a change to a function is “incomparable” if it neither increases nor decreases the execution time across the entire distribution. However, bucketing all changes into three classes is probably too coarse. So what do we make of changes like this? First, it is a good change and it is a win, but I had two additional thoughts:

01F01(p)dω(p)01F11(p)dω(p)\begin{equation} \int_{0}^{1} F_{0}^{-1}\left(p\right) \ d \omega(p) - \int_{0}^{1} F_{1}^{-1}\left(p\right) \ d \omega(p) \end{equation}