Computing An Approx. Convex Hull From a Stream

Computing An Approx. Convex Hull From a Stream

You want to maintain an approx. convex hull of a stream of points $x_0, x_1, \ldots x_n \in \mathbb{R}^d$ . Assume $n$ is large and unknown, points aren’t necessarily drawn from the same distribution, and the approx. hull can be requested at any time during processing. Here are a few ways that we might do this…

$\textbf{Idea 1}$ — Keep the min/max values for each dimension. These $2d$ values can be used to construct a bounding box of the true hull. This can yield a pretty bad approximation that 1) overestimates the volume and 2) isn’t bounded. Ooof.
$\textbf{Idea 2}$ — Keep the $2d$ points with min/max values in each direction. When the hull is required we can compute it from this subset of points in $\mathcal{O}\Big(d \cdot \text{ln}\Big(d))$ time. This algorithm produces a volume which is no larger than the true hull’s.
$\textbf{Idea 3}$ — This approach is similar to $\textbf{(2)}$ . Select $q$ random unit vectors on the $d$ dimension unit sphere. Instead of keeping $2d$ points, keep the $2q$ points which maximize or minimize $x_i \cdot \hat{\mathbf{v}}_j$ for any one of the unit vectors. For large $q$ this should begin to approach the volume of the true hull. This is simulated in the code (with a “nice” example) below.

$\textbf{N.B}$ — I don’t know how large $q$ must be to be to ensure our estimate is $\epsilon$ -close to the true hull’s volume. I tried to estimate $q$ that ensures the surface of $B^{d}$ is within $\epsilon$ of a vector w.h.p and came up with a dodgy argument that involved the ratio of the surface area of $B^{d}$ and the volume of $B^{d-1}$ . After approximating $\Gamma\Big((d/2 - 1/2)/\Gamma\Big(d/2 + 1)$ to $\sqrt{2/x}$ , I’m thinking that it decreases exponentially w. a parameter $\lambda \leq 2\sqrt{\pi d} \cdot \epsilon^{1-d}$ . This is quite large, high dimensions scare me, and something still felt incomplete so I let this line of thinking go…

import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import ConvexHull

def stream_points(d: int, n: int):
    return np.random.normal(0, 1, size=(n, d))

def rand_basis_vect(d: int, m: int):
    A = 2 * (0.5 - np.random.random(size=(m, d)))
    return A / np.linalg.norm(A, ord=2, axis=1).reshape(m, 1)

n_dim, n_rand_vect, n_points = 2, 8, 64 * 1024
L, H = np.zeros(n_rand_vect), np.zeros(n_rand_vect)
A = rand_basis_vect(n_dim, n_rand_vect)
estbuf, truebuf = [], []

for p in stream_points(n_dim, n_points):
    S = np.dot(A, p)
    nL, nH = np.minimum(L, S), np.maximum(H, S)
    if abs(np.linalg.norm(nL - L)) > 0:
        L = nL
        estbuf.append(p)
    if abs(np.linalg.norm(nH - H)) > 0:
        H = nH
        estbuf.append(p)
    truebuf.append(p)

truebuf = np.array(truebuf).reshape(len(truebuf), n_dim)
estbuf = np.array(estbuf).reshape(len(estbuf), n_dim)

hull, esthull = ConvexHull(truebuf), ConvexHull(estbuf)
plt.scatter(truebuf[:, 0], truebuf[:, 1], marker=".", alpha=0.5)
for simplex in hull.simplices:
    plt.plot(truebuf[simplex, 0], truebuf[simplex, 1], "r-")

for simplex in esthull.simplices:
    plt.plot(estbuf[simplex, 0], estbuf[simplex, 1], "k-", alpha=0.5)

plt.scatter(estbuf[esthull.vertices, 0], estbuf[esthull.vertices, 1], color="k", alpha=0.8)
plt.scatter(truebuf[hull.vertices, 0], truebuf[hull.vertices, 1], color="r")
plt.show()

$\textbf{N.B}$ — Wrote this second section (everything below this box) as a “quick follow-up” a few weeks later. It got kinda long and almost required its own post. Instead I decided to just smash the two together…

Recall $\textbf{Idea 3}$ from above. If we’ve implemented this algorithm, how can we determine if a point included in the approximate convex hull is likely to be in the true convex hull? Here’s a quick hand-waving argument using the positive reals in 2-dimensions:

Assume we know the distribution function of the distances, $F_d(x)$ . For illustration’s sake we’ll assume the pdf is sub-Gaussian. We’ll use this property in step $\textbf{(4)}$ .
There exists $\alpha$ for each $x_i$ which satisfies $F_d(\lvert\lvert x_i \rvert\rvert) = 1 - \alpha$ . Thus, the probability that a point is a member of the true hull is at least $1 - \alpha$ . Note that this is quite an overestimate of the replacement probability as the probability $x_i$ is replaced by a “better” point ( $x_j$ ) depends on the direction and magnitude of the replacing point.
We could extend the criteria from $\textbf{(2)}$ to all points and say that the probability we have the correct convex hull is at least $\prod_{i=1}^{2k} F_d(\lvert\lvert x_i \rvert\rvert)$ . This is nice and straightforward, but we can get closer…

$\textbf{N.B}$ — A point $x_j$ replaces $x_i$ if it satisfies $\lvert\lvert x_j \rvert\rvert \gt \lvert\lvert x_i \rvert\rvert \cdot \operatorname{cos}\big(\theta_j - \theta_i)$ . For example, a point $x_i = (2, 2)$ is $\textbf{always}$ replaced if there’s a point with $\lvert\lvert x_j \rvert\rvert \geq \frac{\sqrt{8}}{\cos\left(\frac{\pi}{4}\right)}$ in the first quadrant. We can integrate over $\big[0, \pi/2 \big)$ to find the required vector length (and its probability) conditional on the difference in angles:

$\begin{equation} \frac{2}{\pi}\frac{1}{\sqrt{2\pi}}\cdot\int_{0}^{\frac{\pi}{2}}\int_{\frac{d}{\cos\left(\arctan\left(\frac{x_{i0}}{x_{i1}}\right)\ -\ v\ \right)}}^{\infty}e^{\frac{-u^{2}}{2}}du\ dv \qquad, \qquad d = \sqrt{x_{i0}^{2}\ +x_{i1}^{2}} \end{equation}$

The equations above assume a sub-Gaussian distribution of distances and radial symmetry. You’d be very lucky to have symmetry and known $F_d$ . If you’ve got that - you may as well just use the result in $\textbf{(2)}$ . This will help you (future me) if you ever need to work on this seriously: Some Desmos Equations. Good Luck!