An Approximate Voronoi Algorithm

An Approximate Voronoi Algorithm

Earlier this week, I came up with an algorithm for generating approximate voronoi regions. It’s not optimal in most scenarios, but it has some nice properties:

Consider a set of sites S2S \subset \mathbb{R}^2. We’ll store all sites in a standard quadtree; this will give us 𝒪(lnn)\mathcal{O}\big(\ln n) inserts and deletes and 𝒪(lnn+k)\mathcal{O}\big(\ln n + k) range queries. To serve queries quickly, Bowyer-Watson maintains multiple data structures to store triangulation edges and vertices. By only storing sites, our algorithm saves a good chunk of memory at the cost of needing to compute the voronoi regions on-the-fly. Determine for yourself if this is worthwhile…

For SS in a metric space (Ω,d)\big(\Omega, d), the voronoi region for any site sis_i can be computed with a series of intersections. For simplicity, we will proceed using 2\mathbb{R}^2 and Euclidean distance. With these assumptions, for each pair of sites sis_i and sjs_j, we can partition Ω\Omega into two half-spaces Ωij\Omega_{ij} and Ωji\Omega_{ji} split along the perpendicular bisector of the midpoint of sisjs_i s_j. We call Ωij\Omega_{ij} the side closer to sis_i than sjs_j.

Vor(S,si)=skS,kiΩik\begin{equation} \text{Vor}\big(S, s_i) = \bigcap \limits_{s_k \in S, k \neq i} \Omega_{ik} \end{equation}

However, it is not necessary to compute Ωij\Omega_{ij} for each of n1n-1 sites. The only relevant sites are the direct “neighbors” of sis_i. It has been shown that in 2\mathbb{R}^2, regardless of |S|\lvert S \rvert, the number of “neighbors” for a site is 𝒪(1)\mathcal{O}\big(1) (see: notes on planar graphs). To return accurate voronoi regions, it’s sufficient to compute Vor(S,si)\text{Vor}\big(S', s_i) on the set of neighbors, SSS' \subseteq S. If we select an extra non-neighbor into SS', doesn’t affect on the resulting region and excluding a neighbor from SS' will result in a region that is a superset of the true region. The problem can now be reframed as “how can we select an SS' that includes all ‘neighbors’ with high probability?”.

𝐍.𝐁\textbf{N.B} — We don’t actually know how the distribution of “neighbors” evolves as dd increases. Some simulation-based work has shown values for d=3d=3, beyond that, information is very sparse. See simulation results - fig. 10.

I claim that if we define SS' as the cln(n)c\ln\big(n) sites nearest to sis_i, we can produce an accurate voronoi region 𝑤.ℎ.𝑝\textit{w.h.p}. Inserting a new site to the tree is 𝒪(lnn)\mathcal{O}\big(\ln n) and computing Vor(S,si)\text{Vor}\big(S', s_i) requires a 𝒪(lnn+clnn)\mathcal{O}\big(\ln n + c \ln n) range query. Overall, this operation is 𝒪(clnn)\mathcal{O}\big(c \ln n), and we can always recompute the region with another 𝒪(clnn)\mathcal{O}\big(c \ln n) call to reduce overlap.

𝐍.𝐁\textbf{N.B} — If you read the code, you will notice that in the main text I’ve omitted the complexity of the iterative clipping step. In theory, this is 𝒪(c2ln(n)2)\mathcal{O}\big(c^2 \operatorname{ln}\big(n)^2) as each clip could add an edge. I’m hand-waving a bit here, but because we expect a constant number of “neighbors”, we also expect a constant number of vertices at each step. I believe it is safe to treat this operation as 𝒪(clnn)\mathcal{O}\big(c \ln n). Ask yourself, do I care about the asymptotic complexity? or do we care about the constants? This isn’t a pretty algorithm, so I assume you’re here for the constants…

The cost of a delete depends on how important it is to preserve siSVor(si)=Ω\cup_{s_i \in S} \operatorname{Vor}\big(s_i) = \Omega. I have mixed thoughts on this part:

𝐍.𝐁\textbf{N.B} — We must be a bit careful with the asymptotic complexities. If the density of the cell is increasing, then a delete operation can involve many sites.

I benchmarked my implementation (with a slight assist from *️⃣) on datasets of 16M and 32M sites and found we can handle up to 150,000\sim150,000 operations/s. This is quite slow, but perhaps it’s good enough to keep sites in memory and get just OK performance on region generation.

Benchmark_Voronoi/parallel-size=16M_neighbors=3ln(n)-8            180944              6843 ns/op
Benchmark_Voronoi/baseline-size=16M_neighbors=3ln(n)-8             52060             23019 ns/op
Benchmark_Voronoi/parallel-size=32M_neighbors=3ln(n)-8             73656             26218 ns/op
Benchmark_Voronoi/baseline-size=32M_neighbors=3ln(n)-8             24957             50666 ns/op

I ran the following program with 32M points and found it used just under 800MB. This is actually quite nice compared to the 3.2GB\sim 3.2GB required to store sites and triangulations. Personally, if I were running some sort of large-scale logistics operation that required tracking 32M entities, I’d opt for the Bowyer algorithm and a cloud bill that’s $20\$20/month higher, but this is just my preference…

Showing nodes accounting for 796.53MB, 100% of 796.53MB total
      flat  flat%   sum%        cum   cum%
  528.02MB 66.29% 66.29%   528.02MB 66.29%  github.com/paulmach/orb/quadtree.(*Quadtree).add
  268.51MB 33.71%   100%   796.53MB   100%  main.(*VoronoiGraph).Add
         0     0%   100%   528.02MB 66.29%  github.com/paulmach/orb/quadtree.(*Quadtree).Add
         0     0%   100%   796.53MB   100%  main.main
func main() {
    go http.ListenAndServe("localhost:6060", nil)
    done := make(chan bool)

    bl, tr := orb.Point{0, 0}, orb.Point{1, 1}
    vg     := NewVoronoiGraph(orb.Bound{Min: bl, Max: tr})

    rng := rand.New(rand.NewSource(rand.Int63n(1000)))
    for i := 0; i < 32 * 1024 * 1024; i++ {
        f64x, f64y := rng.Float64(), rng.Float64()
        _ = vg.Add(f64x, f64y, 0)
    }

    for _, nn := range vg.tree.InBound(nil, orb.Bound{bl, tr}) {
        pt := nn.Point()
        id := idFromCoord(pt[0], pt[1])
        _ = vg.ApproxVoronoiCell(id, 2.0, true, nil)
    }
    <-done
}

Some final notes: