Modeling Autoscaling Policies

Modeling Autoscaling Policies

I used an optimal transport map at work! Youโ€™d be right to roll your eyes, but it happened! We started with a metric M=maxmiM = \operatorname{max} m_i and planned to introduce a change that modified the definition of mim_i from 1/nโˆ‘j=1nxij1/n \sum_{j=1}^{n} x_{ij} to xinx_{in}. This change meant our metric went from representing the maximum average of dd groups to the maximum of the last observation among the dd groups. Because the distribution of xijx_{ij} has higher variance than the old distribution of mim_i, MM would likely increase quite a bit. How could we choose a percentile of the new distribution of xijx_{ij} that would approximately match the p100 of the old distribution?

๐.๐\textbf{N.B} โ€” Important to note that all xijx_{ij} values are assumed to be independent and drawn from the same distribution. Thereโ€™s nothing unique about the last observation. Letโ€™s also make things more concrete. This may be useful for readers and future me: n=#scrapesn = \#\text{scrapes}, d=#podsd = \#\text{pods}, xijx_{ij} is the underlying metric from pod ii returned by scrape jj.

In general, given two real-valued, continuous distributions, ff and gg, the optimal transport map is given by Fโˆ’1โˆ˜G:โ„โ†’โ„F^{-1} \circ G \colon \mathbb{R} \to \mathbb{R}. In our case, we can find our desired percentile by swapping the order of composition to Gโˆ˜Fโˆ’1:[0,1]โ†’[0,1]G \circ F^{-1} \colon [0, 1] \to [0, 1]. From there, it was easy to find the corresponding percentile, kk, and rewrite our new metric as Mk*=percentile(k,mi)M_k^* = \operatorname{percentile}\big(k, m_i).


๐€๐๐๐ž๐ง๐๐ฎ๐ฆ\textbf{Addendum} โ€” Thought of this while typing this note out. Two quick claims regarding a bound on the expectation of MM and M1*M_1^* (i.e.ย the maximum of the new mim_i).

E[Mโˆ’ฮผ]โ‰ค(2ฯƒ2ln(d)/n)1/2E[M1*โˆ’ฮผ]โ‰ค(2ฯƒ2ln(d))1/2\begin{equation} E\Big[M - \mu] \leq \left(2 \sigma^2 \operatorname{ln}\big(d)/n \right)^{1/2} \qquad E\Big[M_1^* - \mu] \leq \left(2 \sigma^2 \operatorname{ln}\big(d) \right)^{1/2} \end{equation}

The former is just a result from summing variances, the latter is an application of the maximal inequality in Boucheron, Lugosi, and Massart, ๐’๐ž๐œ๐ญ. ๐Ÿ.๐Ÿ“\textbf{Sect. 2.5}.

๐.๐\textbf{N.B} โ€” This is (somehow) both a hand-wavey and overly formal account of this PR I submitted to Mimirโ€™s alerting rules.