Computing An Approx. Convex Hull From a Stream

Computing An Approx. Convex Hull From a Stream

You want to maintain an approx. convex hull of a stream of points x0,x1,xndx_0, x_1, \ldots x_n \in \mathbb{R}^d. Assume nn is large and unknown, points aren’t necessarily drawn from the same distribution, and the approx. hull can be requested at any time during processing. Here are a few ways that we might do this…

𝐍.𝐁\textbf{N.B} — I don’t know how large qq must be to be to ensure our estimate is ϵ\epsilon-close to the true hull’s volume. I tried to estimate qq that ensures the surface of BdB^{d} is within ϵ\epsilon of a vector w.h.p and came up with a dodgy argument that involved the ratio of the surface area of BdB^{d} and the volume of Bd1B^{d-1}. After approximating Γ((d/21/2)/Γ(d/2+1)\Gamma\Big((d/2 - 1/2)/\Gamma\Big(d/2 + 1) to 2/x\sqrt{2/x}, I’m thinking that it decreases exponentially w. a parameter λ2πdϵ1d\lambda \leq 2\sqrt{\pi d} \cdot \epsilon^{1-d}. This is quite large, high dimensions scare me, and something still felt incomplete so I let this line of thinking go…

import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import ConvexHull

def stream_points(d: int, n: int):
    return np.random.normal(0, 1, size=(n, d))

def rand_basis_vect(d: int, m: int):
    A = 2 * (0.5 - np.random.random(size=(m, d)))
    return A / np.linalg.norm(A, ord=2, axis=1).reshape(m, 1)

n_dim, n_rand_vect, n_points = 2, 8, 64 * 1024
L, H = np.zeros(n_rand_vect), np.zeros(n_rand_vect)
A = rand_basis_vect(n_dim, n_rand_vect)
estbuf, truebuf = [], []

for p in stream_points(n_dim, n_points):
    S = np.dot(A, p)
    nL, nH = np.minimum(L, S), np.maximum(H, S)
    if abs(np.linalg.norm(nL - L)) > 0:
        L = nL
        estbuf.append(p)
    if abs(np.linalg.norm(nH - H)) > 0:
        H = nH
        estbuf.append(p)
    truebuf.append(p)

truebuf = np.array(truebuf).reshape(len(truebuf), n_dim)
estbuf = np.array(estbuf).reshape(len(estbuf), n_dim)

hull, esthull = ConvexHull(truebuf), ConvexHull(estbuf)
plt.scatter(truebuf[:, 0], truebuf[:, 1], marker=".", alpha=0.5)
for simplex in hull.simplices:
    plt.plot(truebuf[simplex, 0], truebuf[simplex, 1], "r-")

for simplex in esthull.simplices:
    plt.plot(estbuf[simplex, 0], estbuf[simplex, 1], "k-", alpha=0.5)

plt.scatter(estbuf[esthull.vertices, 0], estbuf[esthull.vertices, 1], color="k", alpha=0.8)
plt.scatter(truebuf[hull.vertices, 0], truebuf[hull.vertices, 1], color="r")
plt.show()

𝐍.𝐁\textbf{N.B} — Wrote this second section (everything below this box) as a “quick follow-up” a few weeks later. It got kinda long and almost required its own post. Instead I decided to just smash the two together…

Recall 𝐈𝐝𝐞𝐚 𝟑\textbf{Idea 3} from above. If we’ve implemented this algorithm, how can we determine if a point included in the approximate convex hull is likely to be in the true convex hull? Here’s a quick hand-waving argument using the positive reals in 2-dimensions:

  1. Assume we know the distribution function of the distances, Fd(x)F_d(x). For illustration’s sake we’ll assume the pdf is sub-Gaussian. We’ll use this property in step (𝟒)\textbf{(4)}.

  2. There exists α\alpha for each xix_i which satisfies Fd(||xi||)=1αF_d(\lvert\lvert x_i \rvert\rvert) = 1 - \alpha. Thus, the probability that a point is a member of the true hull is at least 1α1 - \alpha. Note that this is quite an overestimate of the replacement probability as the probability xix_i is replaced by a “better” point (xjx_j) depends on the direction and magnitude of the replacing point.

  3. We could extend the criteria from (𝟐)\textbf{(2)} to all points and say that the probability we have the correct convex hull is at least i=12kFd(||xi||)\prod_{i=1}^{2k} F_d(\lvert\lvert x_i \rvert\rvert). This is nice and straightforward, but we can get closer…

𝐍.𝐁\textbf{N.B} — A point xjx_j replaces xix_i if it satisfies ||xj||>||xi||cos(θjθi)\lvert\lvert x_j \rvert\rvert \gt \lvert\lvert x_i \rvert\rvert \cdot \operatorname{cos}\big(\theta_j - \theta_i). For example, a point xi=(2,2)x_i = (2, 2) is 𝐚𝐥𝐰𝐚𝐲𝐬\textbf{always} replaced if there’s a point with ||xj||8cos(π4)\lvert\lvert x_j \rvert\rvert \geq \frac{\sqrt{8}}{\cos\left(\frac{\pi}{4}\right)} in the first quadrant. We can integrate over [0,π/2)\big[0, \pi/2 \big) to find the required vector length (and its probability) conditional on the difference in angles:

2π12π0π2dcos(arctan(xi0xi1)v)eu22dudv,d=xi02+xi12\begin{equation} \frac{2}{\pi}\frac{1}{\sqrt{2\pi}}\cdot\int_{0}^{\frac{\pi}{2}}\int_{\frac{d}{\cos\left(\arctan\left(\frac{x_{i0}}{x_{i1}}\right)\ -\ v\ \right)}}^{\infty}e^{\frac{-u^{2}}{2}}du\ dv \qquad, \qquad d = \sqrt{x_{i0}^{2}\ +x_{i1}^{2}} \end{equation}

The equations above assume a sub-Gaussian distribution of distances and radial symmetry. You’d be very lucky to have symmetry and known FdF_d. If you’ve got that - you may as well just use the result in (𝟐)\textbf{(2)}. This will help you (future me) if you ever need to work on this seriously: Some Desmos Equations. Good Luck!