โ Not a particuarly novel idea. The paper-appreciators may be reminded of Achlioptasโ 2001 paper Database-Friendly Random Projections, where a form of the Johnson-Lindenstrauss Lemma using only basic operations is used to embed values in a general-purpose RDBMS.
Earlier this week I had an idea about quantization of vector embeddings. In theory, vectors of with length can be re-embedded into vectors of with length by taking the sign bit from the original representation. For example, a vector of uses . We can pack 1536 sign bits into a vector of () and reduce memory usage by .
I also suspected that distance calculations on binary vectors would also be significantly faster than on the original vectors. The benchmarks below show that quantization yields a performance improvement in a sequential scan of 100k vectors.
โ I am being a bit (no pun intended) unfair in my evaluation here, most models produce embeddings with 32-bit values.
func BenchmarkQueryOperation_Quantized(b *testing.B) {
dataset := generateDataset(100000, 1536, datasetSeed) // randomized dataset (1536 fp64)
queryset := generateDataset(1000, 1536, querysetSeed) // randomized query vec (1536 fp64)
// NB: baseline test `BenchmarkQueryOperation` just inits `DB`, not `QuantizedDB` here...
db := QuantizedDB{data: make([][]uint64, 100000)}
for id, vector := range dataset {
db.Store(id, vector)
}
b.ResetTimer()
var j int
for i := 0; i <= b.N; i++ {
j++
db.QueryTop(queryset[(j + i)%1000])
}
}go test -bench BenchmarkQuery -benchtime 10s -benchmem
BenchmarkQueryOperation-8 60 191096522 ns/op 0 B/op 0 allocs/op
BenchmarkQueryOperation_Quantized-8 8859 1424105 ns/op 192 B/op 1 allocs/opโ Preserving recall well is a massive concern. Anecdotally, the two implementations agree on the single nearest-neighbor at a level well above chance (e.g.ย given a random vector and a dataset of 10000, the quantized DB and original impl. return the same vector in around 4-6% of queries). I still have work to do to on this front, this feels promisingโฆ
Provided this method preserves recall well, the practical implications of this are quite significant. Consider the Cohere Wikipedia Embeddings dataset that contains vectors of (). Quantizing these embeddings brings us down to and makes it possible to store the entire dataset on an inexpensive VM. Although one really should add an index on a vector store, full scans of the quantized Wikipedia data come in under (or assuming an index that scans vectors).
Finally, Iโd like to justify my choice of for the non-quantized and quantized cases. If we were to blindly apply the cosine distance to binary vectors weโd face (at least) two limitations. By addressing these two concerns, we accidently stumble into the Hamming distance.
โ Letโs do some yak-shaving. Hammingโs distance eliminates the need for this exercise, but letโs sketch out bounds for the denominator for fun. Assume with bits set or unset with equal probability. Using the normal approx. of the binomial, we get counts of set bits and a denominator of . Because we have , the distribution of can be computed relatively easily. Finally, we can integrate (or apply markovโs) to bound the resulting product.
Quantization as described above does not preserve unit norm. We could just ignore the denominator (i.e.ย minimize the negative dot product) if we could show that is near one with high probability. This is probably a fun exercise, but Iโll refrain for now.
We lose even more information because the dot product of our vectors () treats as equivelent (i.e.ย no contribution). We want and to contribute positively to โclosenessโ, not just .
Hamming distance uses rather than in the numerator which preserves information and removes the need for a normalized deniminator.
type DB struct { data [][]float64 } // DB is an "in-memory vector DB"
// Store sets a normed vector as the `id`th entry in the DB
func (db *DB) Store(id int, vec []float64) (int, error) {
db.data[id] = normalize(vec)
return id, nil
}
// QueryTop returns the vector in the DB nearest to a query vector
func (db *DB) QueryTop(queryVec []float64) (int, []float64, error) {
var minDist, minId = math.Inf(1), int(0)
for id, vec := range db.data {
if d := unitVecCosDist(queryVec, vec); d < minDist {
minDist, minId = d, id
}
}
return minId, db.data[minId], nil
}
// unitVecCosDist dots vectors `a` and `b`. for unit vectors, this is cosine dist
func unitVecCosDist(a, b []float64) float64 {
var dot float64
for i, _ := range a {
dot += (a[i] * b[i])
}
return 1 - dot
}
// normalize normalizes a vector to a unit vector w. same direction
func normalize(vec []float64) []float64 {
var unitVec = make([]float64, len(vec))
var s = float64(0)
for _, d := range vec {
s += d * d
}
var magnitude float64 = math.Sqrt(s)
for i, d := range vec {
unitVec[i] = d / magnitude
}
return unitVec
}// QuantizedDB is an in-memory vector DB w. logic for quantizing vectors to 1-bit
type QuantizedDB struct { data [][]uint64 }
// Store sets a quantized vector as the `id`th entry in the DB
func (db *QuantizedDB) Store(id int, vec []float64) (int, error) {
db.data[id] = quantize(vec)
return id, nil
}
// QueryTop returns the vector in the DB nearest to a quantized query
// vector, minimizes Hamming distance.
func (db *QuantizedDB) QueryTop(vec []float64) (int, []uint64, error) {
var minDist, minId = ^int(0), int(0)
var quantizedVec = quantize(vec) // allocates 192B
for id, vec := range db.data {
if d := hammingDist(quantizedVec, vec); d < minDist {
minDist, minId = d, id
}
}
return minId, db.data[minId], nil
}
// hammingDist calculates hamming distance between 2 binary vectors,
// optimized by counting bits set in `a XOR b` instead of equality check.
func hammingDist(a []uint64, b []uint64) int {
var count int
for i := 0; i < len(a); i++ {
count -= bits.OnesCount64(^(a[i] ^ b[i]))
}
return count
}
// sgn returns the sign bit, maps (-inf, 0) -> 1, [0, +inf) -> 0
func sgn(f float64) uint64 { return math.Float64bits(f) >> 63 }
// quantize takes a vector of float64, quantizes each 64 bit value to 1 bit
// and stores the quantized bits in a packed array, []uint64
func quantize(vec []float64) []uint64 {
var quantizedVec = make([]uint64, int(len(vec)/64))
for j := 0; j < len(quantizedVec); j++ {
for i := 0; i <= 63; i++ {
quantizedVec[j] |= sgn(vec[j*64+i]) << (63 - i)
}
}
return quantizedVec
}Addendum (06/18/2024) โ If you wanted to be hardcore you could use SIMD instructions in and get another nice speedup if you have .