Mastodon
Little Question: Bounding The Size of a Skiplist
I thought of a fun, relatively easy question regarding the growth of
a data structure called a skiplist earlier today. For the
purposes of this post, itβs only important to know that
is called on each insert operation.
Iβm working on something big (βbigβ) that lead me to doing an
analysis of skiplists. See W. Pughβs original papers on the data
structure for more background. See: Skip
Lists: A Probabilistic Alternative to Balanced Trees and Concurrent
Maintenance of Skip Lists.
randLevel()
lvl := 1
// random() that returns a random value in [0...1)
while random() < p and lvl < MaxLevel do
lvl := lvl + 1
return lvl
The
method returns a
on
.
If
exceeds the current
,
we increment
(equiv. to setting
to
).
Thus, it takes at least
inserts to reach a
of
.
Given
,
what is the expected number of rounds to reach level
?
What is an upper bound to reach level
that holds with high probability?
As is often the case, Knuth has this one covered. See Concrete
Math, p.Β 32.
The first part is quite easy if you recognize this is a geometric
series. When using
as recommended in the original paper we get
.
In general, we have the following:
The bound in main text is correct but pretty loose. I thought it would
be better when I wrote it out, oh well. We do a little better by just
using Chebyshevβs (below) or trying harder and finding a Chernoff
Bound.
Using
,
,
,
Chebyshev gives
vs.Β
for method from the main text. Not great!
Now to find an upper bound. Notice that the output of
is geometrically distributed with parameter
.
Weβll use the fact that
to express
as the sum of
independent exponential RVs
.
For any probability
,
we can find
such that
.
Once weβve found
,
it must also be true that
.
We can reference the exponential CDF and see that the following holds
for all
:
Setting
and summing over the individual bounds, we arrive at a bound for the
sum. This may be tuned to the desired failure probability (e.g.Β for
failure probability
,
set
).
Addendum (6/19/2024) β Two more small results from letting this
rattle around in my head over the past few days. Iβm going to do these
very quickly.
- What is the distribution of the number of comparisons made in a
skiplist
?
Inutuition tells me it should be
.
Iβm unsure if this was analyzed in Pughβs follow-up papers, in the
original he gives
,
has mean
.
I may be missing some nuance, but this seems correctβ¦
Still not such a nice solution, but Chernoff does give a
better bound. Using
,
,
,
Chernoff gives
vs.Β
for method from the main textβ¦
Letβs go back and actually get this Chernoff Bound for the first
question posed on this post. For simplicity, weβll use the exponential
distribution, but keep the original per-level probabilities,
,
notice that
.
Weβll act as if this change is de minimis, I think it is. With
that said, once we get a derivative of this easier-to-work-with
expression, we do in fact get a better bound than before!
Noticing that the first term is never negative, we can focus on the
second. From here, Wolfram gives an analytical answer for
which we can plug back into
to get a bound on
.