Modeling Autoscaling Policies

I used an optimal transport map at work! You’d be right to roll your eyes, but it happened! We started with a metric $M = \operatorname{max} m_i$ and planned to introduce a change that modified the definition of $m_i$ from $1/n \sum_{j=1}^{n} x_{ij}$ to $x_{in}$ . This change meant our metric went from representing the maximum average of $d$ groups to the maximum of the last observation among the $d$ groups. Because the distribution of $x_{ij}$ has higher variance than the old distribution of $m_i$ , $M$ would likely increase quite a bit. How could we choose a percentile of the new distribution of $x_{ij}$ that would approximately match the p100 of the old distribution?

$\textbf{N.B}$ — Important to note that all $x_{ij}$ values are assumed to be independent and drawn from the same distribution. There’s nothing unique about the last observation. Let’s also make things more concrete. This may be useful for readers and future me: $n = \#\text{scrapes}$ , $d = \#\text{pods}$ , $x_{ij}$ is the underlying metric from pod $i$ returned by scrape $j$ .

In general, given two real-valued, continuous distributions, $f$ and $g$ , the optimal transport map is given by $F^{-1} \circ G \colon \mathbb{R} \to \mathbb{R}$ . In our case, we can find our desired percentile by swapping the order of composition to $G \circ F^{-1} \colon [0, 1] \to [0, 1]$ . From there, it was easy to find the corresponding percentile, $k$ , and rewrite our new metric as $M_k^* = \operatorname{percentile}\big(k, m_i)$ .

$\textbf{Addendum}$ — Thought of this while typing this note out. Two quick claims regarding a bound on the expectation of $M$ and $M_1^*$ (i.e. the maximum of the new $m_i$ ).

$\textbf{(1a)}$ The variance of the new $m_i$ is $\mathcal{O}\big(n)$ times larger than that of the old $m_i$ .
$\textbf{(1b)}$ If $x_{ij}$ are iid, sub-Gaussian RVs with mean $\mu$ and variance $\sigma^2$ and split into $d$ groups of size $n$ , then both $M$ and $M_1^*$ exceed $\mu$ , but $M_1^*$ does so by a factor $\Theta\big(n^{1/2})$ greater than $M$ . Notice:

$\begin{equation} E\Big[M - \mu] \leq \left(2 \sigma^2 \operatorname{ln}\big(d)/n \right)^{1/2} \qquad E\Big[M_1^* - \mu] \leq \left(2 \sigma^2 \operatorname{ln}\big(d) \right)^{1/2} \end{equation}$

The former is just a result from summing variances, the latter is an application of the maximal inequality in Boucheron, Lugosi, and Massart, $\textbf{Sect. 2.5}$ .

$\textbf{N.B}$ — This is (somehow) both a hand-wavey and overly formal account of this PR I submitted to Mimir’s alerting rules.