A Few Notes On Pareto Fronts

A Few Notes On Pareto Fronts


I was in Madrid recently and skimmed On The Probability Of A Pareto Record on the plane back home. The notes below cover two questions I had about independent Pareto points. Consider ๐–,๐•โˆˆโ„d\bm{W},\bm{V} \in \mathbb{R}^d, we say ๐–โ‰บ๐•\bm{W} \prec \bm{V} if wi<viw_{i} \lt v_{i} for iโˆˆ[0,d)i \in [0, d) and call ๐•\bm{V} a pareto point of SS if for ๐–,๐•โˆˆS\bm{W},\bm{V} \in S, there is no point ๐–\bm{W} satisfying ๐•โ‰บ๐–\bm{V} \prec \bm{W}.

๐.๐\textbf{N.B} โ€” Thereโ€™s also a (very handy!) video from a Fields Institute Analysis of Algorithms conference where Fill describes the main results of On The Probability Of A Pareto Record. The bound I give in ๐๐Ÿ\textbf{Q1} is corroborated in ๐‘๐ž๐ฆ๐š๐ซ๐ค ๐Ÿ‘.๐Ÿ\textbf{Remark 3.1} of the paper and briefly mentioned in the video.

๐๐š๐ซ๐ญ ๐Ÿ\textbf{Part 1} โ€” Assuming the following, whatโ€™s the probability that the nthn^{th} point observed is pareto?

  1. Each coordinate of ๐•โˆˆโ„d\bm{V} \in \mathbb{R}^d is an i.i.d exponentially distributed RV.
  2. Points are streamed and added to SS one-by-one.

Notice that pn,dp_{n, d} greatly depends on the โ„“1\ell^1 norm of ๐•\bm{V}. To compute the probability that a point is pareto, we integrate the product of the probability a point has โ„“1\ell^1 norm equal to xx and the probability that none of the preceding nโˆ’1n-1 points dominate ๐•\bm{V}.

pn,d=โˆซฮฉeโˆ’โˆฅ๐ฑโˆฅ1(1โˆ’eโˆ’โˆฅ๐ฑโˆฅ1)nโˆ’1dx=โˆซ0โˆžxdโˆ’1eโˆ’x(dโˆ’1)!(1โˆ’eโˆ’x)nโˆ’1dx\begin{eqnarray} p_{n,d} & = & \int_{\Omega} e^{-\|\mathbf{x}\|_1} \left(1 - e^{-\|\mathbf{x}\|_1} \right)^{n-1} \, dx \\ & = & \int_{0}^{\infty}\frac{x^{d-1}e^{-x}}{{\left(d-1\right)!}}\left(1\ -\ e^{-x}\right)^{n-1}\ dx \end{eqnarray}

To simplify the integral notice that the second term is strictly increasing on โ„+\mathbb{R}^+, and satisfies limxโ†’0+f=0lim_{x \to 0^+} f = 0 and limxโ†’โˆžf=1lim_{x \to \infty} f = 1. Thus, instead of integrating a product, weโ€™ll just integrate the right tail of the Erlang pdf on [nฮฑ,โˆž)[n\alpha, \infty).

๐.๐\textbf{N.B} โ€” I use ฮฑ=(1โˆ’eโˆ’1)โˆ’1โ‰ˆ1.582\alpha = (1 - e^{-1})^{-1} \approx 1.582 because F(nฮฑ)=0.5307F\big(n\alpha) = 0.5307 approximates the median of the second term. I could be a bit more rigorous and integrate from F(โˆ’ln(1โˆ’2โˆ’1/(nโˆ’1)))=0.500F\big(-\ln\big(1-2^{-1/(n-1)})) = 0.500, but for ๐ŸŒˆ๐œ๐จ๐ฆ๐ฉ๐ฎ๐ญ๐š๐ญ๐ข๐จ๐ง๐š๐ฅ ๐ซ๐ž๐š๐ฌ๐จ๐ง๐ฌ๐ŸŒˆ๐ŸŒˆ\textbf{computational reasons}๐ŸŒˆ weโ€™ll be better off with the first approximation.

After zooming through a few mechanical steps we arrive at an asymptotic bound for pn,dp_{n, d} that agrees with ๐‘๐ž๐ฆ๐š๐ซ๐ค ๐Ÿ‘.๐Ÿ\textbf{Remark 3.1} in Fillโ€™s paper and the main result of Barndorff-Nielsen and Sobelโ€™s On the distribution of the number of admissible points in a vector random sample.

pn,dโ‰ˆโˆซln(nฮฑ)โˆžx(dโˆ’1)eโˆ’x(dโˆ’1)!dx=1nฮฑโˆ‘j=0dโˆ’1ln(nฮฑ)jj!=ฮ˜(ln(n)dโˆ’1n(dโˆ’1)!)\begin{equation} p_{n,d} \approx \int_{\ln\left(n\alpha\right)}^{\infty}\frac{x^{\left(d-1\right)}e^{-x}}{\left(d-1\right)!}\ dx = \frac{1}{n\alpha}\sum_{j=0}^{d-1}\frac{\ln\left(n\alpha\right)^{j}}{j!} = \Theta\left(\frac{\ln\left(n\right)^{d-1}}{ n\left(d-1\right)!}\right) \end{equation}

๐๐š๐ซ๐ญ ๐Ÿ\textbf{Part 2} โ€” Iโ€™ll call the ๐๐ž๐ฉ๐ญ๐ก\textbf{depth} of a point ๐•\bm{V} in a set (SS) the number of successive layers of pareto points that must be removed from SS before ๐•\bm{V} becomes pareto in the remaining subset. Given ๐•๐ง\bm{V_n}, a newly-observed pareto point of SS with โ„“n1=โˆฅ๐•๐งโˆฅ1\ell^1_n = \|\bm{V_n}\|_1, what is the expected depth of ๐•n\bm{V}_n after an additional mm points are observed?

Notice that the probability of ๐•๐งโ‰บ๐•๐ฃ\bm{V_n} \prec \bm{V_{j}} is โˆi=0dโˆ’1eโˆ’vni=eโˆ’โ„“n1\prod_{i=0}^{d-1} e^{-v_n_i} = e^{-{\ell_n^1}}. Over mm observations, this implies that the size of the set of points that dominate ๐•๐ง\bm{V_n} is distributed by Bin(m,eโˆ’โ„“n1)\operatorname{Bin}(m, e^{-{\ell_n^1}}).

Hand-waving a bit, most pairs in this set will be non-comparable (i.e.ย ๐•๐ขโЁ๐•๐ฃโˆฉ๐•๐ขโŠ€๐•๐ฃ\bm{V_i} \nsucc \bm{V_j} \cap \bm{V_i} \nprec \bm{V_j}) and a small fraction, 2โˆ’d2^{-d}, will satisfy ๐•๐ขโ‰บ๐•๐ฃ\bm{V_i} \prec \bm{V_j}. If we treat the set of points that dominate ๐•๐ง\bm{V_n} as a probabilistic graph, we get a lower-bound thatโ€™s logarithmic in mm.

log2d(mโ‹…eโˆ’โ„“n1)=ฮฉ(ln(m)d)\begin{equation} \log_{2^d}\left(m \cdot e^{-\ell^1_n}\right) = \Omega\left(\frac{ln\big(m)}{d}\right) \end{equation}

Even better! Recall that random graphs have ๐ƒ๐ข๐š๐ (G)=2๐‘ค.โ„Ž.๐‘\textbf{Diag}\big(G)=2 \ \textit{w.h.p}. Voila! Done! However, our real graph has a lot more structure than one with randomly assigned edges. Iโ€™d conjecture that the actual depth of ๐•n\bm{V}_n is proportional to the number of layers in a non-dominating sort of the points dominating ๐•๐ง\bm{V_n}.

๐”๐ฉ๐๐š๐ญ๐ž!\textbf{Update!} โ€” I did what you might call โ€œvibe-boundingโ€ above. Fortunately for me, finding the number of layers in non-dominating sort is a research question on itโ€™s own. When mm is large, this is approximately equal to the length of the longest chain from ๐•n\bm{V}_n to the pareto front. In The Longest Chain Among Random Points In Euclidean Space, the authors show depth is ฮ˜(n1/d)\Theta\big(n^{1/d}) when sampling uniformly from [0,1]d[0, 1]^d. We can map exponential coordinates to [0,1]d[0, 1]^d easily with the CDF, but sampling uniformly from the cube may be more challenging. Letโ€™s define ฮณ:[0,โˆž)dโ†’[0,1]d\gamma \ \colon [0, \infty)^d \to [0, 1]^d as the projection of 1โˆ’eโˆ’๐—1 - e^{-\bm{X}} onto the diagonal of the dd-dimensional unit cube.

ฮณ(๐—)=๐Ÿโ‹…1dโˆ‘i=0dโˆ’11โˆ’eโˆ’xi\begin{equation} \gamma(\bm{X}) = \mathbf{1} \cdot \frac{1}{d}\sum_{i=0}^{d-1} 1 - e^{-x_i} \end{equation}

This transform shrinks โ„“n1\ell^1_n (even when coordinates are transformed back into โ„+\mathbb{R}^{+}), but it ensures weโ€™re considering points in the cube [v*,1]d[v*, 1]^d. We now proceed with the bound proposed in Bollobรกs and Winkler using the point ฮณ(๐•๐ง)\gamma(\bm{V_n}) to estimate the size of the dominating set.

d = 5
b = 2 ** -d
m = 2**12
data = []

for x_n in np.random.exponential(1.0, size=(1024, d)):
    samples = np.random.exponential(size=(m, d))

    # Lower-Bound: Function of L1 norm โ€“ Assumes Random Graph
    l1_norm = np.sum(x_n)
    y0 = 1 - np.log(m) / np.log(b) + l1_norm / np.log(b)

    # Upper-Bound: Function of Transformed L1 norm
    v = transform(x_n)
    y1 = np.exp(1) * (m * np.exp(-np.sum(v))) ** (1 / d)

    # Actual: Compute Depth via Non-Dominating Sort
    dom = dominating_subset(samples, x_n) # Subset of `samples` which dominate x_n
    layers = non_dominated_sort(dom) # Claude wrote the NDS algorithm, I trust Claude, but YMMV
    x = len(layers)
    data.append((x, y0, y1))