How I Think About Performance

How I Think About Performance

An idea, or a family of ideas, has been rattling around in my head for a few weeks now. Luckily, they’re much less interesting than I originally thought. I’m liberated in a sense. I have the freedom to articulate them roughly, and in 30 minutes, rather than burning my whole afternoon yapping. Our topic today is Performance!. For what it’s worth, I’d also apply this logic to e.g., system design interviews, AI evals, any system with (1)(1) multiple valid solutions and (2)(2) solutions evaluated on multiple criteria independent of their validity.

𝐍.𝐁.\textbf{N.B.} β€” I am realizing that I’ve written about this idea (implicitly) in a previous post β€” see: A Note On Comparing Incomparable Performance Changes. I suppose the purpose of this post is to think about things like 10% deeper than the glib refrain, β€œengineering is about tradeoffs”.

The core idea is that given a task with goals 𝒒1,𝒒2,…,𝒒n\mathcal{G}_1, \mathcal{G}_2, \ldots, \mathcal{G}_n, we say that a solution Saβ‰ΌSbS_a \preceq S_b if and only if 𝒒𝒾(Sa)≀𝒒𝒾(Sb)\mathcal{G_i}\big(S_a\big) \leq \mathcal{G_i}\big(S_b\big) for all i∈[1,n]i \in [1, n]. Informally, this is a partial order. In engineering software for Performance!, our goals might be to minimize CPU seconds, peak memory utilization, wall time, and LOC, and maximize throughput. That’s it. This is the only real idea I wanted to convey today. To understand a complex problem, consider modeling candidate solutions as a partial order.


As a concrete example, let’s consider some code I wrote earlier this week. It’s a suite of implementations of an inverted index. For simplicity’s sake, I say we have just three goals: minπš‹πšžπš’πš•πš_πšπš’πš–πšŽ\min \texttt{build_time}, minπš‹πš’πšπšŽπšœ_πš™πšŽπš›_𝚍𝚘𝚌\min \texttt{bytes_per_doc}, and maxπš›πšŽπšš_πš™πšŽπš›_𝚜𝚎𝚌\max \texttt{req_per_sec}. When I benchmarked all implementations, I found the following:

impl bytes/doc build (s) query (’000s rps)
btree-str-nosort 1976.4 32.9 0.32
btree-str 1976.4 25.4 1.53
btree-hash 1880.7 22.5 2.18
btreeg-hash 609.3 15.0 4.99
sorted-str 330.6 8.18 18.05
inv-weak 234.9 9.01 29.85
inv-nopool 235.2 3.93 26.95
inv-two-pools 234.1 3.53 29.67
𝐹𝑖𝑔. 1\textit{Fig. 1} β€” Nothing special, regular old table. If you’re interested in the mechanics (?) of this exercise, you can read about how I used an agent to find the optimizations that allowed me to grind the build time down another 60% from πš’πš—πšŸ-πš πšŽπšŠπš”\texttt{inv-weak} to πš’πš—πšŸ-𝚝𝚠𝚘-πš™πš˜πš˜πš•πšœ\texttt{inv-two-pools} here.
𝐹𝑖𝑔. 2\textit{Fig. 2} β€” In this Hasse diagram (see: Partial Order Set), an outbound arrow from a node indicates that the implementation is dominated by, at minimum, the implementation it points to.

In the process of going through dominated solutions, I found that I often get a much deeper understanding of the problem I’m working through. As an aside, this knowledge accumulates, and in the future I’ll be prepared for at least a half-dozen new classes of problem that I may not have otherwise recognized but for the slow, deliberate walk from πš‹πšπš›πšŽπšŽ-πšœπšπš›-πš—πš˜πšœπš˜πš›πš\texttt{btree-str-nosort} (a terrible solution, written quickly, meant only to compile and pass some tests so I could pass an interview round) to πš’πš—πšŸ-𝚝𝚠𝚘-πš™πš˜πš˜πš•πšœ\texttt{inv-two-pools} (which I might consider for β€œproduction”).

𝐍.𝐁.\textbf{N.B.} β€” Tapping the Sign. If you make something 2x faster, maybe you’ve done something smart. If you make it 100x faster, you’ve probably stopped doing something stupid…

N.B. β€” π’ͺ(n+m)\mathcal{O}(n + m) doesn’t dominate π’ͺ(nlogm)\mathcal{O}(n \log m) for all n,mn,m, but is asymptotically better when n∼Θ(m)n \sim \Theta(m) (a typical case…). Furthermore, the merging algorithm can be implemented much more efficiently than the btree probing algorithm.