0% found this document useful (0 votes)
21 views14 pages

Ribbon Filter: Practically Smaller Than Bloom and Xor: Peter C. Dillinger Stefan Walzer

Uploaded by

5mmg6zggv4
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views14 pages

Ribbon Filter: Practically Smaller Than Bloom and Xor: Peter C. Dillinger Stefan Walzer

Uploaded by

5mmg6zggv4
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Ribbon filter: practically smaller than Bloom and Xor1

Peter C. Dillinger Stefan Walzer


Facebook, Inc. University of Cologne
Seattle, Washington, USA Cologne, Germany
peterd@[Link] walzer@[Link]

ABSTRACT themselves2 . Thus, the Bloom filter filters out almost all specific
Filter data structures over-approximate a set of hashable keys, i.e. set key queries3 to data files that would find no relevant data. False
membership queries may incorrectly come out positive. A filter with negative (FN) queries would be incorrect for this application and
false positive rate 𝑓 ∈ (0, 1] is known to require ≥ log2 (1/𝑓 ) bits must never occur.
Blocked Bloom filters [44, 54] are a popular Bloom variant be-
arXiv:2103.02515v2 [[Link]] 8 Mar 2021

per key. At least for larger 𝑓 ≥ 2−4 , existing practical filters require
a space overhead of at least 20% with respect to this information- cause they are extremely fast. We do not expect to improve upon
theoretic bound. this solution4 for short-lived applications such as database joins
We introduce the Ribbon filter: a new filter for static sets with or the smallest levels of an LSM-tree. However, Bloom filters use
a broad range of configurable space overheads and false positive at least 44% more space (“space overhead”) than the information-
rates with competitive speed over that range, especially for larger theoretic lower bound of 𝜆 = log2 (1/𝑓 ) bits per key for a hashed
𝑓 ≥ 2−7 . In many cases, Ribbon is faster than existing filters for filter with FP rate 𝑓 [13, Section 2.2]. Blocked Bloom can exceed
the same space overhead, or can achieve space overhead below 10% 50% space overhead for small 𝑓 .
with some additional CPU time. An experimental Ribbon design In this work we focus on saving space in static filters, and opti-
with load balancing can even achieve space overheads below 1%. mizing the CPU time required for saving space. Our presentation
A Ribbon filter resembles an Xor filter modified to maximize and validation are kept general, but an intended application is opti-
locality and is constructed by solving a band-like linear system mizing accesses to the largest levels of an LSM-tree, where it should
over Boolean variables. In previous work, Dietzfelbinger and Walzer be worth CPU time to save space in long-lived memory-resident
describe this linear system and an efficient Gaussian solver. We structures5 . In [18] it is shown that a relatively high FP rate for
present and analyze a faster, more adaptable solving process we call these levels is best for overall efficiency. However, the space sav-
“Rapid Incremental Boolean Banding ON the fly,” which resembles ings offered by existing practical Bloom filter alternatives is limited
hash table construction. We also present and analyze an attractive (dotted line in Figure 1 (a)), especially for higher FP rates, 𝜆 ≤ 5.
Ribbon variant based on making the linear system homogeneous, Bloom filter and alternatives. We can categorize hashed filters
and describe several more practical enhancements. by the logical structure of a query:
OR probing: Cuckoo filters [11, 32, 34, 35], Quotient filters [5,
15, 30, 51], and most others return true for a query if any
one of several probed locations is a hashed data match, much
1 INTRODUCTION like hash table look-ups. This design is great for supporting
dynamic add and delete, but all known instances use (1 +
Background and motivation. The primary motivation for this 𝜀)𝜆 + 𝜇 bits per key where 𝜇 > 1.44, or often 𝜇 = 3 for speed6 .
work is optimizing data retrieval, especially in systems aggregating Even with 𝜀 ≈ 0.05, the space overhead is large for small 𝜆.
immutable data resources. In the example of LSM-tree storage [50], AND probing: A Bloom filter query returns true iff all probed
persisted key-value data is primarily split among immutable data locations (bits) are a match (set to 1).
files. A crucial strategy for reducing I/O in key look-ups is filtering
accesses using an in-memory data structure. In a common configu- 2 Learned filters [55] or tries [59] can take advantage of regularities in the key set. A
ration, each data file has an associated Bloom filter [9] representing space-efficient hash table [15] can take advantage of a densely covered key space. We
focus on the general case.
the set of keys with associated data in that file. The Bloom filter has 3 Static filters can also support range queries, either through prefix Bloom [48] or more
some false positive (FP) rate, which is the probability that querying sophisticated schemes [47].
a key not added returns true (positive). For example, configuring 4 In Section 7 we mention and use another blocked Bloom implementation with new

the Bloom filter to use 10 bits of space per added key yields an FP trade-offs [24].
5 In some large-scale applications using RocksDB [31] for LSM-tree storage, we observe
rate of just under 1%, regardless of the size or structure of the keys roughly 10% of memory and roughly 1% of CPU used in blocked Bloom filters. The
size-weighted average age of a live filter is about three days. An experimental Ribbon
filter option was added to RocksDB in version 6.15.0 (November 2020).
This work is licensed under the Creative Commons BY-NC-ND 4.0 International 6 𝜇 > 1.44 can be explained by these structures approximating a minimal perfect
License. Visit [Link] to view a copy of hash [36], a near-strict ordering of keys based on the location of their matching entry
this license. Copyright is held by the owner/author(s). in the structure, on top of 𝜆 payload bits per key. This is an observation about existing
1 Title inspired by [34, 35, 39]. structures, not necessarily a fundamental limitation.

1
Peter C. Dillinger and Stefan Walzer

XOR probing: Xor filters [10, 14, 20, 37, 39] return true for a 𝑓 fastest filter
query iff the bitwise exclusive-or (XOR) of all probed loca-
2−16
tions is a hashed data match. XOR probing is only known to

Bloom

Xor+
Xor
work with static filters. 2−14

Cuckoo
Structures using XOR probing are the most promising for space
2−12
efficient static filters. They are constructed by solving a linear sys-
tem with one constraint per key ensuring that querying the key 2−10
returns true. Standard Xor filters use a fast solving process called Balanced
Ribbon

Blocked Bloom
peeling that limits their space efficiency to ≥ 1.22𝜆 bits per key7 , 2−8
though a variant with fast compression, the Xor+ filter [39], uses
roughly 1.08𝜆 + 0.5 bits per key, which is an improvement for larger 2−6
𝜆. Using structured Gaussian elimination instead of peeling [21, 37] 2−4 Homogeneous
offers better space efficiency, but construction times are considered
impractical for many applications. 2−2
Ribbon
%space
Core contribution. We introduce a faster, simplified, and more overhead
100 50 30 16 8 4 2 1
adaptable Gaussian elimination algorithm (Ribbon) for the static
𝑓 query times 𝑓 construction times
function data structure from [22]. Based on Ribbon, we develop a 2−16 2−16
100 ns 300 ns
family of practical and highly space-efficient XOR-probed filters. 2−14 70 ns 2−14 150 ns
2−12 50 ns 2−12 80 ns
Results and comparison. Figure 1 (a) summarizes extensive bench- 2−10 40 ns 2−10
30 ns 40 ns
marking data by indicating which structure is fastest for satisfying 2−8
20 ns
2−8 20 ns
various space and FP rate requirements for a static filter. For “fastest” 2−6 15 ns 2−6 10 ns
2−4 10 ns 2−4 5 ns
we consider the sum of the construction time per key and three 2−2 % space 2−2 % space
overhead overhead
query times (measured for 𝑥 ∈ 𝑆, 𝑥 ∉ 𝑆 and a mixed data set).8 100 50 30 16 8 4 2 1 100 50 30 16 8 4 2 1
Although we compare Ribbon with many approaches imple-
mented in the fastfilter benchmark library [38], only variants of Figure 1: (a) Fastest filter with the given combination of
Bloom, Cuckoo, Xor, and Ribbon emerge as winners. Specifically, space overhead and false positive rate, considering a mix of
the color at point (𝑥, 𝑦) indicates the fastest filter with space over- construction and query times. (b) Construction and query
head at most 𝑥 and FP rate 𝑓 ∈ [𝑦/2, 𝑦 ·2] for 𝑛 = 107 keys. Diagonal times for fastest approaches in (a).
shading indicates different winners for 𝑛 = 106 and 𝑛 = 108 . The
timings for the winning approach are also shown in Figure 1 (b).
We observe the following.
small Ribbon structures (“smash”). These features go into the
• Ribbon wins to the right of the dotted line because none of
Standard Ribbon filter, which in practice has increasing space
the competing approaches achieve space overhead this low.
overhead or running time as the number of keys increases.9
• Ribbon wins in some territory previously occupied by Xor
Section 4. We present the Homogeneous Ribbon filter, which
and Xor+ filters, mostly for 𝑓 > 2−8 (𝜆 < 8) from Xor and
shares many desirable properties with blocked Bloom filters:
𝑓 > 2−12 (𝜆 < 12) for Xor+. In these cases, Ribbon-style
construction success is guaranteed, and scaling to any num-
Gaussian elimination is faster than peeling.
ber of keys is efficient. Homogeneous Ribbon does not build
• Blocked Bloom filters are still the fastest whenever appli-
on static functions in the standard way, which simplifies
cable, though Cuckoo and Xor take some of the remaining
implementation but complicates analysis.
>44% territory (and nearby) for small FP rates.
Section 5. We describe some practical enhancements and is-
Outline. The paper is structured as follows. sues for Ribbon filters, including (1) efficiently utilizing any
Section 2. We briefly review data structures for static func- amount of memory for any number of keys, (2) laying out
tions and how they give rise to filters. data for efficient queries, (3) efficiently satisfying hashing
Section 3. We describe and analyze the new Ribbon construc- requirements, and (4) scaling Standard Ribbon with data
tion algorithm, which preserves asymptotic guarantees from sharding.
[22]. We also show how to improve the space efficiency of Section 6. We present Balanced Ribbon, an experimental ex-
tension of Standard Ribbon that uses a greedy load balancing
7A new spatially-coupled construction for Xor filters [57] promises lower space over- scheme within a contiguous ribbon. This scales the extreme
heads and in some cases slightly improved construction time with peeling. Simulations space efficiency of small Standard Ribbon filters to very large
indicate a number of keys on the order of 106 or more is needed for most of the benefit,
limiting the known generality of the approach. Configuration in practice is not well 𝑛, such as only 1.005𝜆 + 0.008 bits per key with practical con-
understood. struction and query times.
8 The “right” weighing of construction and query time clearly depends on the use case.
Section 7. We present more experimental validation.
Because LSM-trees are especially useful for write-heavy workloads requiring good
read latency, we find this a reasonable ratio for at least that use case. If a 4KB filter
memory page has a lifetime as long as two weeks and at least one negative (useful)
query per added key (𝑛 roughly 212 ) is seen, that satisfies the current rule of five 9 If
the processor word size is assumed to be Ω (log 𝑛) , query time and space overhead
minutes for caching SSD storage in RAM [2]. can be kept constant. We make no such assumption here.
2
Ribbon filter: practically smaller than Bloom and Xor

2 FROM STATIC FUNCTIONS TO FILTERS


Approximate membership queries and filters. An approximate
membership query filter – filter for short – represents a set 𝑆 ⊆ U 𝑤
from some universe U. A membership query with 𝑥 ∈ 𝑆 must 𝑛
return true, while a query with 𝑥 ∈ U \ 𝑆 may return true with
probability at most 𝑓 where 𝑓 > 0 is the false positive (FP) rate.
Static functions. A static function is a data structure (sometimes
called “retrieval data structure”) representing a function 𝑏 : 𝑆 →
{0, 1}𝑟 for some set 𝑆 ⊆ U of keys. A query for 𝑥 ∈ 𝑆 must return 𝑚 = (1 + 𝜖)𝑛
𝑏 (𝑥) but a query for 𝑥 ∈ U \ 𝑆 may return any value from {0, 1}𝑟 .
Membership queries (“is 𝑥 ∈ 𝑆?”) are not supported. Figure 2: Typical shape of the random matrix with rows
Static Functions from Linear Systems. A well-known way for ®
(𝒉(𝒙)) 𝒙 ∈𝑺 sorted by starting positions. The shaded “ribbon”
constructing static functions [3, 10, 14, 20, 37, 53] uses a hash func- region contains random bits. Gaussian elimination never
tion to associate each key 𝑥 ∈ U with a set ℎ(𝑥) ⊆ [𝑚] for causes any fill-in outside of the ribbon.
some 𝑚 ≥ 𝑛 = |𝑆 |. By ℎ(𝑥)® ∈ {0, 1}𝑚 we denote the character-
istic (row) vector of ℎ(𝑥). If (ℎ(𝑥))
® 𝑥 ∈𝑆 are linearly independent
in the vector space {0, 1}𝑚 over the two-element field then the Ribbon filters also retrieve fingerprints using Equation (1)—just
system (ℎ(𝑥)
® · 𝑍 = 𝑏 (𝑥))𝑥 ∈𝑆 of linear equations has a solution with a different choice of ℎ—they can be seen as (non-standard) Xor
𝑍 ∈ {0, 1}𝑚×𝑟 . The static function is then given by ℎ and 𝑍 . Most filters.
memory is used for the 𝑚𝑟 bits of 𝑍 , which takes 𝑚 The sgauss construction. For a parameter 𝑤 ∈ N that we call
𝑛 𝑟 bits per key.
A query for 𝑥 ∈ U returns the ribbon width, the vector ℎ(𝑥)
® ∈ {0, 1}𝑚 is given by a random
starting position 𝑠 (𝑥) ∈ [𝑚 − 𝑤 − 1] and a random coefficient vector
query(𝑥) := ℎ(𝑥)
® (1)
Ê
·𝑍 = 𝑍𝑖 . 𝑐 (𝑥) ∈ {0, 1} 𝑤 as ℎ(𝑥)
® = 0𝑠−1𝑐 (𝑥)0𝑚−𝑠−𝑤+1 . Note that even though
𝑖 ∈ℎ (𝑥)
𝑚-bit vectors like ℎ(𝑥) are used to simplify mathematical discussion,
®
where 𝑍𝑖 denotes the 𝑖-th row of 𝑍 . Since queries involve |ℎ(𝑥)| · 𝑟 such vectors can be represented using log(𝑚) + 𝑤 bits.
bits from 𝑍 fast query times require sparse ℎ(𝑥).
® The matrix with rows (ℎ(𝑥))
® 𝑥 ∈𝑆 sorted by 𝑠 (𝑥) has all of its
Several constructions choose ℎ such that ℎ(𝑥)® contains exactly 1-entries in a “ribbon” of width 𝑤 that randomly passes through
three 1-bits in random positions [10, 37]. In this case 𝑚 𝑛 must not
the matrix from the top left to the bottom right, as in Figure 2. The
exceed the corresponding XORSAT threshold [19, 52] 𝑐 3∗ ≈ 0.92. If authors of [22] showed that a solution 𝑍 ∈ {0, 1}𝑚×𝑟 to (ℎ(𝑥)
® ·𝑍 =
a greedy algorithm is used for solving the linear system, then 𝑚 𝑛
𝑏 (𝑥))𝑥 ∈𝑆 can be computed quickly:
must not exceed the peeling threshold 𝑐 3Δ ≈ 0.82. log 𝑛
The space usage is roughly (1+𝜀)𝑟 bits per key when 𝑚 = (1+𝜀)𝑛. Theorem 3.1 ([22, Thm 2]). For any constant 0 < 𝜀 < 12 , 𝑤 = 𝜀
and 𝑚𝑛 = 1 − 𝜀, with high probability the linear system (ℎ(𝑥)® ·𝑍 =
The first paper to achieve 𝜀 = 𝑜 (1) is [53]. Even 𝜀 = O (log 𝑛/𝑛) is
possible, albeit with mediocre construction time [21]. A recent more 𝑏 (𝑥))𝑥 ∈𝑆 is solvable for any 𝑟 ∈ N and any 𝑏 : 𝑆 → {0, 1} . Moreover,
𝑟
®
after sorting (ℎ(𝑥)) 𝑥 ∈𝑆 by 𝑠 (𝑥), Gaussian elimination can compute
practical contribution that (more humbly) aims for small constant
𝜀 > 0 [22] will be the starting point of our own construction. a solution 𝑍 in expected time O (𝑛/𝜀 2 ).
Xor filters. There is a straightforward way to obtain a filter with Boolean banding on the fly. For Ribbon we start with the same
FP rate 2−𝑟 from an 𝑟 -bit static function as pointed out in [20, hash function ℎ® as in sgauss. For slightly improved presentation,
Observation 1]. Simply pick a random fingerprint function 𝑏 : U → execution speed, and chance of construction success, we force coef-
{0, 1}𝑟 (a hash function) and store its restriction 𝑏𝑆 : 𝑆 → {0, 1}𝑟 as ficient vectors 𝑐 (𝑥) to start with 1.11
a static function. Then for any 𝑥 ∈ 𝑆 the static function reproduces The main difference lies in how we solve the linear system. The
𝑏𝑆 (𝑥) = 𝑏 (𝑥) while for 𝑥 ∈ U \ 𝑆 the returned value will match insertion phase maintains a reduced system 𝑀 of linear equations
𝑏 (𝑥) only with probability 2−𝑟 (because 𝑏 (𝑥) is random and plays using on-the-fly Gaussian elimination [8]. This system is of the form
no role in the construction of the static function). shown in Figure 3 and has 𝑚 rows. Each row 𝑖 is represented by a
Such filters inherit the performance of the underlying static 𝑤-bit vector 𝑐𝑖 ∈ {0, 1} 𝑤 and 𝑏𝑖 ∈ {0, 1}𝑟 . Logically, the 𝑖-th row is
function, giving them the potential to be “Faster and Smaller Than either empty (𝑐𝑖 = 0𝑤 ) or specifies a linear equation 𝑐𝑖 ·𝑍 [𝑖,𝑖+𝑤) = 𝑏𝑖
Bloom and Cuckoo Filters” as claimed in [39], when dynamic inser- where 𝑐𝑖 starts with a 1. With 𝑍 [𝑖,𝑖+𝑤) ∈ {0, 1} 𝑤×𝑟 we refer to rows
tions and deletions are not required. A standard construction with 𝑖, . . . , 𝑖 + 𝑤 − 1 of 𝑍 . We ensure 𝑐𝑖 · 𝑍 [𝑖,𝑖+𝑤) is well-defined even
ℎ(𝑥) being a fully random set of size 3 is appropriately named Xor when 𝑖 + 𝑤 − 1 > 𝑚 with the invariant that 𝑐𝑖 values never select
filter [39]. “out of bounds” rows of 𝑍 .
We consider the equations (ℎ(𝑥)® · 𝑍 = 𝑏 (𝑥))𝑥 ∈𝑆 one by one, in
3 RIBBON RETRIEVAL AND RIBBON FILTERS arbitrary order, and try to integrate each into 𝑀 using Algorithm 1,
By enriching the sgauss static function10 from [22], we obtain the which we explain now. A key’s equation may be modified several
Ribbon static function which can be used as a Ribbon filter. Since
11 Inasymptotic considerations this change is inconsequential (and mildly annoying).
10 Strictly
speaking, sgauss is the name of the construction algorithm of the otherwise For better alignment with [22] our theorems still assume that 𝑐 (𝑥) is uniformly
unnamed data structure. distributed in {0, 1} 𝑤 .
3
Peter C. Dillinger and Stefan Walzer

Algorithm 1: Adding a key’s equation to the linear system 𝑀.


1 𝑐2 𝑏2
1 𝑐3 𝑏3 1 𝑖 ← 𝑠 (𝑥)
2 𝑐 ← 𝑐 (𝑥)
1 𝑐5 𝑏5
𝑚 ·𝑍 = 3 𝑏 ← 𝑏 (𝑥)
1 𝑐7 𝑏7 4 loop
5 if 𝑀.𝑐 [𝑖] = 0 then // row 𝑖 of 𝑀 is empty
1 𝑐9 𝑏9 6 𝑀.𝑐 [𝑖] ← 𝑐
7 𝑀.𝑏 [𝑖] ← 𝑏
8 return success (inserted)
𝑚
9 𝑐 ← 𝑐 ⊕ 𝑀.𝑐 [𝑖]
Figure 3: Shape of the linear system 𝑴 central to Boolean 10 𝑏 ← 𝑏 ⊕ 𝑀.𝑏 [𝑖]
banding on the fly. 11 if 𝑐 = 0 then
12 if 𝑏 = 0 then return success (redundant)
13 else return failure (inconsistent)
times before it can be added to 𝑀, but a loop invariant is that its
14 𝑗 ← findFirstSet(𝑐) // a.k.a. BitScanForward
form is
15 𝑖 ←𝑖+𝑗
𝑐 · 𝑍 [𝑖,𝑖+𝑤) = 𝑏 for 𝑖 ∈ [𝑚], 𝑐 ∈ 1 ◦ {0, 1} 𝑤−1 , 𝑏 ∈ {0, 1}𝑟 . (2) 16 𝑐 ← 𝑐 >> 𝑗 // logical shift last toward first
The initial equation ℎ(𝑥)
® · 𝑍 = 𝑏 (𝑥) of key 𝑥 ∈ 𝑆 has this form with
𝑖 = 𝑠 (𝑥), 𝑐 = 𝑐 (𝑥) and 𝑏 = 𝑏 (𝑥). We proceed it as follows.
Case 1: In the simplest case, row 𝑖 of 𝑀 is empty and we can undo a set of most recent successful insertions: Simply remove the
incorporate Equation (2) as the new 𝑖-th row of 𝑀. rows from 𝑀 that were added. These properties are not shared by
Case 2: Otherwise row 𝑖 of 𝑀 is already occupied by an equa- sgauss and will be exploited in Section 6.
tion 𝑐𝑖 · 𝑍 [𝑖,𝑖+𝑤) = 𝑏𝑖 . We compute the summed equation Analysis. All performance guarantees for the construction algo-
rithm carry over from sgauss as follows.
𝑐 ′ · 𝑍 [𝑖,𝑖+𝑤) = 𝑏 ′ with 𝑐 ′ = 𝑐 ⊕ 𝑐𝑖 and 𝑏 ′ = 𝑏 ⊕ 𝑏𝑖 , (3)
which, in the presence of row 𝑖 of 𝑀, puts the same constraint Theorem 3.2. Let 𝑆 ⊆ U be an arbitrary key set.
on 𝑍 as Equation (2). Both 𝑐 and 𝑐𝑖 start with 1, so 𝑐 ′ starts (i) If an sgauss construction succeeds for 𝑆 then so does the Ribbon
with 0. We consider the following sub-cases. construction.
Case 2.1: 𝑐 ′ = 0𝑤 and 𝑏 ′ = 0𝑟 . The equation is void and can (ii) If both constructions succeed on 𝑆, expected running times coin-
be ignored. This case is reached when the key’s original cide up to constant factors.
equation is implied by equations previously added to 𝑀. Proof.
Case 2.2: 𝑐 ′ = 0𝑤 and 𝑏 ′ ≠ 0𝑟 . The equation is unsatisfiable. (i) This is unsurprising as both approaches attempt to solve the
This case is reached when the key’s original equation is same linear system.12 A superficial difference concerns redun-
inconsistent with equations previously added to 𝑀. dant equations. In Algorithm 1 it is natural to ignore them.
Case 2.3: 𝑐 ′ starts with 𝑗 > 0 zeroes followed by a 1. Then sgauss treats them as failures, to avoid special cases during
Equation (3) can be rewritten as 𝑐 ′′ · 𝑍 [𝑖 ′,𝑖 ′ +𝑤) = 𝑏 ′ where back-substitution.
𝑖 ′ = 𝑖 + 𝑗 and 𝑐 ′′ is obtained from 𝑐 ′ by discarding the 𝑗 (ii) Consider a set 𝑆 on which sgauss succeeds, i.e. 𝑆 gives rise
leading zeroes of 𝑐 and appending 𝑗 trailing zeroes. to a solvable system without redundant equations. The back-
Note that in the bit-shift of Algorithm 1 the roles of “lead- substitution phases are identical in both algorithms. The anal-
ing” and “trailing” may seem reversed because the least- ysis of the Ribbon insertion phase hinges on counting row
significant “first” bit of a word is conventionally thought additions. Each key 𝑥 ∈ 𝑆 has a starting position 𝑠 (𝑥) and
of as the “right-most” bit. causes some row 𝑖 (𝑥) of 𝑀 to be filled. The number of row
Termination is guaranteed since 𝑖 increases with each loop iteration. additions for the insertion is clearly at most the displacement
Once equations for all keys are successfully inserted, we obtain 𝑖 (𝑥) − 𝑠 (𝑥) of 𝑥. Summing over all keys yields
a solution 𝑍 to 𝑀 in the back substitution phase. The rows of 𝑍 are ∑︁ ∑︁ ∑︁
𝐷= 𝑖 (𝑥) − 𝑠 (𝑥) = 𝑖− 𝑠 (𝑥)
obtained from bottom to top. If row 𝑖 of 𝑀 contains an equation
𝑥 ∈𝑆 𝑖 ∈𝑃 𝑥 ∈𝑆
then this equation uniquely determines row 𝑖 of 𝑍 in terms of later
rows of 𝑍 . If row 𝑖 of 𝑀 is empty, then row 𝑖 of 𝑍 can be initialized where 𝑃 = {𝑖 (𝑥) | 𝑥 ∈ 𝑆 } is the set of row-indices of 𝑀 that
arbitrarily. end up being occupied. A crucial observation is that even
though the values 𝑖 (𝑥) depend on the insertion order of the
“On-the-fly” and “incremental.” The insertion phase of ribbon
keys, the set 𝑃 does not. Indeed, for any 𝑗 ∈ [𝑚] the value
is on-the-fly in the sense that for a sequence 𝑆 = (𝑥 1, 𝑥 2, 𝑥 3, . . . ) of
|𝑃 ∩ {1, . . . 𝑗 }| is the rank of the sub-matrix of 𝑀 formed by its
keys we can easily determine the longest prefix (𝑥 1, . . . , 𝑥𝑛 ) of 𝑆
first 𝑗 columns, which is invariant under row operations. So
for which construction succeeds: Simply insert keys until the first
failure. The insertion phase is incremental because we can easily 12 See Footnote 11.
4
Ribbon filter: practically smaller than Bloom and Xor


no matter in what order the keys of 𝑆 are inserted, we always For a smash value of ℓ = 𝑤/2 and a ribbon width of 𝑤 = 𝜔 ( 𝑛)
observe the same sets 𝑃 ∩ {1, . . . 𝑗 } and hence the same set 𝑃 the matrix diagonal is firmly within the ribbon, see Figure 4 (b). It
and the same value 𝐷. The number of row additions of sgauss is not hard to prove that such a matrix is asymptotically as likely
is bounded by 𝐷 by a similar argument (see [22, Lemma 3]). to be regular as a fully random 𝑛 × 𝑛 matrix. That probability is
The analysis in [22] proceeds by bounding 𝐷 in expectation 𝑐 2 ≈ 0.289; see [16].
and hence carries over to our case. □ In Section 5.4 we present empirical findings showing that non-
zero smash values also benefit success probabilities in practically
Efficiency. While sgauss and Ribbon are tied in O-notation, Ribbon log 𝑛
more relevant cases with 𝜀 > 0 and 𝑤 = O ( 𝜀 ).
improves upon sgauss in constant factors for the following reasons:
• There is no need to pre-sort the keys by 𝑠 (𝑥). 4 HOMOGENEOUS RIBBON FILTERS
• sgauss requires explicitly storing a pivot position for each Recall that the idea underlying Ribbon filters is to pick hash func-
row. This is because sgauss does not compute an echelon tions ℎ® : U → {0, 1}𝑚 , 𝑏 : U → {0, 1}𝑟 and find 𝑍 ∈ {0, 1}𝑚×𝑟
form but only ensures that in each row the left-most 1-entry such that all 𝑥 ∈ 𝑆 satisfy ℎ(𝑥)
® · 𝑍 = 𝑏 (𝑥), while most 𝑥 ∈ U \ 𝑆
— the pivot — is the bottom-most 1-entry of its column. will not.
• sgauss performs roughly 𝐷 elimination steps that, depend- We now examine what happens when we get rid of the fin-
ing on some bit, turn out to be xor-operations or no-ops. gerprint function 𝑏 effectively setting 𝑏 (𝑥) = 0 for all 𝑥 ∈ U. A
Ribbon on the other hand performs roughly 𝐷/2 bit shifts filter is then given by a solution 𝑍 to the homogeneous system
and 𝐷/2 (unconditional) xor operations. Though the details ®
(ℎ(𝑥) · 𝑍 = 0𝑟 )𝑥 ∈𝑆 . The FP rate for 𝑍 is 𝑓𝑍 = Pr𝑎∼𝐻 [𝑎 · 𝑍 = 0𝑟 ]
are complicated, intuition on branching complexity seems where 𝐻 is the distribution of ℎ(𝑥)® for 𝑥 ∈ U. An immediate is-
to favour ribbon. sue with the idea is that 𝑍 = 0𝑚×𝑟 is a solution giving 𝑓𝑍 = 1.
A solution 𝑍 chosen uniformly at random from all solutions fares
3.1 Ribbon with Smash better, however. To obtain one, all free variables, i.e. the variables
When aiming for high space efficiency, there is an issue with early corresponding to empty rows of 𝑀, are initialized randomly during
and late columns of the linear system. We shall describe the problem back substitution.14 The overall FP rate is then 𝑓 = E[𝑓𝑍 ] where 𝑍
and its solution in an extreme but simple case where perfect space depends on the randomness in (ℎ(𝑥)) ® 𝑥 ∈𝑆 and the free variables.
efficiency, i.e. 𝑚 = 𝑛 is desired. We call the resulting construction Homogeneous Ribbon filter. It
In the absence of redundant equations, the construction of the has two obvious advantages over Standard Ribbon filters:
linear system succeeds only if all slots of 𝑀 can be filled. For the
• Constructions can never fail, regardless of 𝑛, 𝜀 and 𝑤. This
first 𝑖 slots to be filled, it is necessary that |{𝑥 ∈ 𝑆 | 𝑠 (𝑥) ≤ 𝑖}| ≥ 𝑖.
is simply because a homogeneous linear system always has
Figure 4 (a) illustrates that random fluctuations make this unlikely.
at least the trivial solution.
There is a similar problem relating to the last 𝑖 columns13 .
• The absence of fingerprints slightly improves time and space
in construction. (In optimized implementations, query times
(a) (b)
are essentially the same.)
A complication is that the FP rate 𝑓 can be higher than 2−𝑟 . Intu-
itively, if too many equations constrain some part of 𝑍 then that
part will be insufficiently random. For 𝜀𝑤 > 𝐶𝑟 (for some constant
𝐶 > 0) and large 𝑛, this effect is negligible as we shall argue in
Theorem 4.1. This still leaves us with two disadvantages, especially
for small 𝑛 and high 𝑟 :
• The product 𝜀𝑤 must be proportional to 𝑟 (for acceptable 𝑓 )
whereas for Standard Ribbon 𝜀𝑤 need only be proportional
to log 𝑛 (for acceptable success probability). We should there-
Figure 4: (a) Consider the diagonal (dashed) in a square Rib- fore not expect an improvement over Standard Ribbon for
bon system. Its beginning and end may lie outside of the of 𝑟 = Ω(log 𝑛).
the (shaded) ribbon area. • For small 𝑛, the variance of 𝑓𝑍 is quite high, meaning small fil-
(b) Our “smash” variant solves this problem. ters will occasionally have significantly more false positives
(e.g. due to random skew in (𝑠 (𝑥))𝑥 ∈𝑆 ) with no obvious way
We can address this issue by artificially inflating the probabilities to detect this during construction. This could be dangerous
Pr[𝑠 (𝑥) = 1] and Pr[𝑠 (𝑥) = 𝑚 − 𝑤 + 1] of the first and last starting for some applications.
position by a factor of ℓ we call the smash value. Such a distribution
for 𝑠 is easy to implement using a uniform distribution on [−ℓ + 4.1 Analysis
2, 𝑚 − 𝑤 + ℓ] and “clamping” the sampled value to [1, 𝑚 − 𝑤 + 1]
We can make a strong case for Homogeneous Ribbon filters by
using min and max functions. Micro-benchmarks show roughly
showing that arbitrarily small space overhead at arbitrarily large
3ns overhead per query for smash, on an Intel Skylake CPU.
13 To
see the symmetry, we could have argued about the rank of the first 𝑖 columns of 14 Our implementation uses trivial pseudo-random assignments: a free variable in row
𝑀 which is at most | {𝑥 ∈ 𝑆 | 𝑠 (𝑥) ≤ 𝑖 } | . 𝑖 is assigned 𝑝𝑖 mod 2𝑟 for some fixed large odd number 𝑝 .
5
Peter C. Dillinger and Stefan Walzer

size 𝑛 is achievable. This neither requires the ribbon width 𝑤 to (b1) 𝑖 is contained in a long overloaded interval.
scale with 𝑛, nor a deviation from the pure construction (e.g. by (b2) 𝑖 ∈ pos(𝑆 ′ ) for a minimal dependent set 𝑆 ′ with long non-
partitioning the key set into small shards). overloaded interval pos(𝑆 ′ ).
By space overhead we mean space
opt − 1 where space is the space (b3) 𝑖 ∈ pos(𝑆 ′ ) for a minimal dependent set 𝑆 ′ with short interval
usage of the filter in bits per key and opt = − log2 (𝑓 ) is the pos(𝑆 ′ ).
information-theoretic lower bound for filters that achieve the same We shall now establish the following
FP rate.
Claim: ∀𝑖 ∈ [𝑚] : Pr[𝑖 is bad] = exp(−Ω(𝜀𝑤)).
Theorem 4.1. There exists 𝐶 ∈ R+ such that for any 𝜀 < 1/2,
any desired FP rate 𝜑 > 0 and 𝑟 the closest integer to − log2 (𝜙) the For each 𝑖 ∈ [𝑚] the contributions from each of the badness con-
following holds. For any 𝑤 ∈ N with 𝜀𝑤 > 𝐶 max(log 𝑤, 𝑟 ) and ditions (b1,b2,b3) can be bounded separately. In all cases we use
log 𝑤
𝑛 ∈ N the Homogeneous Ribbon filter with 𝑛 keys and parameters our assumption 𝜀 ≥ 𝐶 𝑤 . It ensures that exp(−Ω(𝜀𝑤)) is at
𝜀, 𝑤, 𝑟 has 𝑓 ∈ [𝜑/2, 2𝜑] and space overhead at most 2𝜀. most exp(−Ω(log 𝑤)) = 𝑤 −Ω (1) and can “absorb” factors of 𝑤
in the sense that by adapting the constant hidden in Ω we have
Our argument starts with the following simple observation. 𝑤 exp(−Ω(𝜀𝑤)) = exp(−Ω(𝜀𝑤)).
Lemma 4.2. In the context of a Homogeneous Ribbon filters let 𝑝 (b1) A Chernoff bound for sums 𝑋 = 𝑗 𝑋 𝑗 of i.i.d. indicator ran-
Í
®
be the probability that for 𝑦 ∈ U \ 𝑆 the vector ℎ(𝑦) is in the span of dom variables with 𝜇 = E[𝑋 ] is
®
(ℎ(𝑥)) . Then we have
𝑥 ∈𝑆 Pr[𝑋 ≥ (1 + 𝛿)𝜇] ≤ exp(−𝛿 2 𝜇/3). (4)
−𝑟
𝑓 = 𝑝 + (1 − 𝑝)2 .
We use it in a case where 𝐼 is an interval and 𝑋 1, . . . , 𝑋𝑛+1
Proof. First assume there exists 𝑆 ′ ⊆ 𝑆 with ℎ(𝑦)
® ®
= 𝑥 ∈𝑆 ′ ℎ(𝑥) indicate which of the keys in 𝑆 + have a starting position within
Í
which happens with probability 𝑝. In that case 𝐼 . For 𝑛 ≫ 𝑤 we have
® ® ® · 𝑍) = 0 (𝑛 + 1)|𝐼 | 𝑛|𝐼 |
∑︁ ∑︁
ℎ(𝑦) ·𝑍 = ( ℎ(𝑥)) ·𝑍 = (ℎ(𝑥) 𝜇 = E[𝑋 ] ≤ ≈ = |𝐼 |/(1 + 𝜀).
𝑥 ∈𝑆 ′ 𝑥 ∈𝑆 ′ 𝑚 −𝑤 +1 𝑚
and 𝑦 is a false positive. Otherwise, i.e. with probability 1 − 𝑝, an Skipping over some uninteresting details, the probability for 𝐼
attempt to add ℎ(𝑦)
® ·𝑍 = 0 to 𝑀 after all equations for 𝑆 were added to be overloaded is (for 𝑛 ≫ 𝑤)
would have resulted in a (non-redundant) insertion in some row Pr[𝑋 ≥ (1 − 𝜀/2)|𝐼 |] ≤ Pr[𝑋 ≥ (1 + 𝜀/6) |𝐼 |/(1 + 𝜀) ]
𝑖. During back substitution, only one choice for the 𝑖-th row of 𝑍 Eq. 4 −𝜀 2 |𝐼 |
|{z} | {z }

satisfies ℎ(𝑦)
® · 𝑍 = 0. Since the 𝑖-th row was initialized randomly ≤ exp( ). 𝛿 ≥𝜇
(5)
we have Pr[ℎ(𝑦) · 𝑍 = 0 | ℎ(𝑦) ∉ span((ℎ(𝑥)) 108(1 − 𝜀)
𝑥 ∈𝑆) ] = 2 .
® ® ® −𝑟 □
The probability for 𝑖 ∈ [𝑚] to be contained in a long over-
We shall now derive an asymptotic bound on 𝑝 in terms of large loaded interval is bounded by the sum of Equation (5) over all
𝑤 and small 𝜀 (recall 𝜀 = 𝑚−𝑛 𝑛 ). It is too imprecise to estimate 𝑝 lengths |𝐼 | ≥ 𝑤 2 and all |𝐼 | offsets that 𝐼 can have relative to 𝑖.
and 𝑓 in practical settings, which we do empirically in Section 4.2. The result is of order exp(−Ω(𝜀 2𝑤 2 ) and hence small enough.
The main takeaway is rather that 𝑝 does not depend on 𝑛. We may (b2) Consider a long interval 𝐼 that is not overloaded, i.e. |𝐼 | ≥ 𝑤 2
therefore expect Homogeneous Ribbon filters to scale to arbitrary and |𝑆𝐼 | ≤ (1 − 𝜀/2)|𝐼 |. There are at most 2 |𝑆𝐼 | sets 𝑆 ′ of keys
sizes 𝑛 with no increase in 𝑓 even when 𝑤 and 𝜀 are constants. with pos(𝑆 ′ ) = 𝐼 and each is a dependent set with probability
Lemma 4.3. There exists a constant 𝐶 such that for any 𝑤 ∈ N 2−|𝐼 | because each of the |𝐼 | positions of 𝐼 that 𝑆 ′ touches
log 𝑤 imposes one parity condition.
and 𝐶 𝑤 ≤ 𝜀 ≤ 12 we have 𝑝 = exp(−Ω(𝜀𝑤)).
The probability for 𝐼 to support at least one dependent set is
The main ingredient in the proof is that exp(−Ω(𝜀𝑤)) bounds therefore at most 2−|𝐼 | · 2 |𝑆𝐼 | = 2− 2 |𝐼 | = exp(−Ω(𝜀 |𝐼 |)).
𝜀

the number of keys that cannot be (non-redundantly) inserted, Similar as in (b1) for 𝑖 ∈ [𝑚] we can sum this probability over
which follows from [22].15 all admissible lengths |𝐼 | ≥ 𝑤 2 and all offsets that 𝑖 can have
Proof Sketch. We may imagine that 𝑆 ⊆ U and 𝑦 ∈ U \ 𝑆 in 𝐼 to bound the probability that 𝑖 is bad due to (b2).
are obtained from a set 𝑆 + ⊆ U of size 𝑛 + 1 by picking 𝑦 ∈ 𝑆 + at (b3) Let 𝑆 red ⊆ 𝑆 be the set of redundant keys, i.e. keys for which
random and setting 𝑆 = 𝑆 + \ {𝑦}. Then 𝑝 is simply the expected Algorithm 1 returns “success (redundant)”. While 𝑆 red de-
fraction of keys in 𝑆 + that are contained in some dependent set, i.e. pends on the insertion order, the rank defect |𝑆 red | = 𝑛 −
in some 𝑆 ′ ⊆ 𝑆 + with 𝑥 ∈𝑆 ′ ℎ(𝑥)
Í ® = 0𝑚 . Clearly, 𝑥 is contained in a rank((ℎ(𝑥))
® 𝑥 ∈𝑆 ) does not. A central step in [22] implies that
dependent set if and only if it is contained in a minimal dependent E[|𝑆 red |] = 𝑚 · exp(−Ω(𝜀𝑤)).16
set. Such a set 𝑆 ′ always touches a consecutive set of positions, i.e. Now if 𝑖 is bad due to (b3) then 𝑖 ∈ pos(𝑆 ′ ) for some minimal
pos(𝑆 ′ ) := 𝑥 ∈𝑆 ′ [𝑠 (𝑥), 𝑠 (𝑥) + 𝑤 − 1] is an interval.
Ð dependent set 𝑆 ′ with short pos(𝑆 ′ ). At least one key from
We call an interval 𝐼 ⊆ [𝑚] long if |𝐼 | ≥ 𝑤 2 and short otherwise. 𝑆 ′ is redundant (even if all keys from pos(𝑆 ′ ) are inserted
We call it overloaded if 𝑆𝐼 := {𝑥 ∈ 𝑆 + | 𝑠 (𝑥) ∈ 𝐼 } has size |𝑆𝐼 | ≥ first). In particular, 𝑖 is within short distance (< 𝑤 2 ) of the
|𝐼 | · (1 − 𝜀/2). Finally, we call a position 𝑖 ∈ [𝑚] bad if one of the starting position of a redundant key 𝑥. Therefore at most |𝑆 red |·
following is the case: 16 The log 𝑛
bound is only used to show that for 𝑤 = Ω ( 𝜀 ) all insertions succeed with
15 Recall Footnote 11. high probability.
6
Ribbon filter: practically smaller than Bloom and Xor

𝑓 4.2 Space efficiency in practice


2−14 To minimize the space overhead 𝑟 (1 + 𝜀)/log 𝑓 −1 − 1 for chosen
values of 𝑟 and 𝑤 in a Homogeneous Ribbon filter, 𝜀 must be neither
2−12

Balanced 𝑤 = 32

Balanced 𝑤 = 64
too large or too small. (Small 𝜀 causes 𝑓 to explode due to densely-
Blocked

Xor, etc.

2−10 Xor+ packed constraints.) To choose 𝜀, we turn to simulations on random


data, building large structures and testing the FP rate. Using 𝑚 =
2−8
3 · 107 (among others), 𝑤 ∈ {16, 32, 64, 128}, and 𝑟 ∈ [1, 16], we see
Homo
Homo
Hom
Ho

2−6 a pretty clear pattern in where the space overhead is minimized for
mo Bloom

og. 𝑤

any large 𝑛:
g. 𝑤 =
g. 𝑤 =
g.

2−4 4 + 𝑟 /4
𝑤=

(7)
= 32

𝜀≈ .
1
16

64

2−2 𝑤
28

% space
overhead Note how the recommendation 𝜀 > 𝐶 max(log 𝑤, 𝑟 )/𝑤 one might
100 50 30 16 8 4 2 1
derive from Theorem 4.1 (vaguely) agrees. We use (7) in all ex-
periments. In Figure 5 we show combinations of space overhead
Figure 5: Combinations of space overhead and false positive and FP rate to expect from Homogeneous Ribbon filters for 𝑤 ∈
rate achievable with various ribbon widths 𝑤 and large 𝑛. {16, 32, 64, 128} and large 𝑛.17 Balanced Ribbon filters, also in Fig-
(Experimental data for Ribbon configurations use 𝑛 = 106 ure 5, are discussed in Section 6.
but generalize.) For example, consider using 𝑟 = 7 for roughly 1% FP rate and
𝑤 = 64 for reasonable space-vs.-time trade-off. Using 𝜀 ≈ 0.09 from
Equation (7) we observe 𝑓 ≈ 0.81% > 0.78% ≈ 2−7 , so actual space
2𝑤 2 positions are bad due to (b3), which is an exp(−Ω(𝜀𝑤))-
overhead is closer to 10% than the 9% allocated with 𝜀.
fraction of all positions as desired.
With a concentration argument the following variant of the claim 5 MAKING RIBBON PRACTICAL
can be proved. We omit the details.
5.1 Configurability and elasticity
Claim’: ∀𝑥 ∈ 𝑆 + : Pr[𝑠 (𝑥) is bad] = exp(−Ω(𝜀𝑤)).
A useful and perhaps previously unreplicated feature of Bloom
Now assume that a key 𝑥 ∈ 𝑆 + is contained in a minimal dependent filters is the ability to efficiently utilize any amount of space for
set 𝑆 ′ . It follows that all of pos(𝑆 ′ ) is bad. Indeed, either pos(𝑆 ′ ) is a minimizing the FP rate in representing any number of keys. We
short interval (→ b3) or it is long. If it is long, then it is overloaded call this configurability and suggest it is practically important for
(→ b1) or not overloaded (→ b2). In any case 𝑠 (𝑥) ∈ pos(𝑆 ′ ) is bad. space efficiency. Consider an application with little control over
Therefore, the probability 𝑝 for 𝑦 ∈ 𝑆 + to be contained in a the number of keys going into a filter. Even if we use a perfectly
dependent set is at most the probability for 𝑠 (𝑦) to be bad. This is space efficient filter for a specific FP rate, we could be wasting
upper-bounded by exp(−Ω(𝜀𝑤)) using Claim’. □ significant space due to internal fragmentation from an allocator.
A memory allocator like jemalloc [33] averages about 10% internal
We are now ready to prove Theorem 4.1.
fragmentation on arbitrarily sized allocations, space that should
Proof of Theorem 4.1. We shall find that our filter has 𝑓 ≤ ideally be used by the filter to reduce its FP rate 18 .
2−𝑟 (1 + 𝜀 2 ) and hence 𝑓 ∈ [𝜑/2, 2𝜑] with high probability. More specifically, Bloom alternatives such as Cuckoo, Quotient,
The space consumption of 𝑍 ∈ {0, 1}𝑚×𝑟 is space = 𝑚𝑟 and Xor conventionally use cells of some whole number of bits, as
𝑛 = 𝑟 (1+𝜀)
bits per key. To relate this to opt, we need bounds on 𝑝. Ribbon does with 𝑟 . Whole number 𝑟 limits space-efficient choices
By assumption 𝜀𝑤 > 𝐶 log 𝑤, so a large enough choice for 𝐶 of FP rates and bits per key. For example, using only 5-bit cells
permits the use of Lemma 4.3, which guarantees 𝑝 = exp(−Ω(𝜀𝑤)). when 5.5 bits per cell is available adds roughly 10% space overhead
Again using 𝜀𝑤 > 𝐶 max(log 𝑤, 𝑟 ) for large enough 𝐶 gives to our filter. Other than Ribbon, which is tied to the two-element
field, these same Bloom alternatives can use fractional-bit cell sizes.
1
𝑝 ≤ exp(−2 log(𝑤) − 𝑟 ) ≤ 2 𝑒 −𝑟 ≤ 𝜀 2 2−𝑟 . (6) Some configurations can even be made efficient, such as 64/𝑖 bits per
𝑤
cell for whole 𝑖 (an existing Xor10.666 implementation is tested in
Together with Lemma 4.2 we get Section 7), but fine granularity would surely be more CPU intensive.
opt = − log2 (𝑓 )
Lem 4.2
= − log2 (𝑝 + (1 − 𝑝)2−𝑟 ) An alternative way of generalizing to effectively fractional 𝑟 is to
split available space between two structures: one using ⌈𝑟 ⌉ solution
Eq. 6
≥ − log2 (𝑝 + 2−𝑟 ) ≥ − log2 (2−𝑟 (1 + 𝜀 2 )) columns (or bits per cell) and the other using ⌊𝑟 ⌋ for a weighted
average of 𝑟 , chosen to fit available space. This only slightly in-
= 𝑟 − log2 (1 + 𝜀 2 ) ≥ 𝑟 − 𝜀 2 . creases the overall space overhead. For example, using 𝑟 = 5.5
Putting everything together yields yields (non-homogeneous) FP rate of 3/128. The lower bound for
this rate is 𝜆 = 5.415 bits per key, so the approach adds 1.57% to
space 𝑟 (1 + 𝜀) (1 + 𝜀)
= ≤ ≤ 1 + 2𝜀.
opt 𝑟 − 𝜀2 (1 − 𝜀 2 ) 17 Standard Ribbon filters are not shown because the achievable overhead depends
1 on 𝑛 . We do recommend Standard Ribbon (with smash) for small 𝑛 < 104 where
The last step makes use of 𝜀 < 2. □ Homogeneous Ribbon filters have high variance in FP rate.
18 See RocksDB’s optimize_filters_for_memory option [27].

7
Peter C. Dillinger and Stefan Walzer

overall space overhead. This addition is a larger 6.00% for 𝑟 = 1.5, d𝑟 e = 4


or smaller 0.82% for 𝑟 = 10.5 bits per key. Practical concerns with
𝑤 𝑤
splitting into two structures includes (a) essentially doubling many
of the space usage penalties associated with small structures (when b𝑟 c = 3
applicable), and (b) independently seeding hashes or accepting joint
𝑤
b𝑟 c = 3
construction success probability (when applicable).
For Ribbon we recommend a variant of that approach within
𝑍= 𝑤 b𝑟 c = 3
a single structure: using only ⌊𝑟 ⌋ solution bits per row for some 𝑤 d𝑟 e = 4
prefix of rows, and ⌈𝑟 ⌉ bits per row for the rest of rows. The banding
process (Algorithm 1) is unchanged, but small changes are needed 𝑤 d𝑟 e = 4
to back-substitution and query (more details in Section 5.2). Because
of Ribbon’s locality of probes in queries, unlike standard Xor filters,
Figure 6: A solution matrix (left) and 𝑤-bit interleaved
a diminishingly small 𝑤/𝑚 portion of queries cross the boundary
column-major layout (ICML, right) of that matrix, mixing
between ⌊𝑟 ⌋ and ⌈𝑟 ⌉ columns (utilizing only ⌊𝑟 ⌋ in such cases), so
⌊𝑟 ⌋ and ⌈𝑟 ⌉ columns as in Section 5.1. The shaded region
space efficiency is very close to the idealized split approach, and
shows the bits used in a query crossing the boundary be-
probably better in practice: around 1% additional space overhead
tween ⌊𝑟 ⌋ and ⌈𝑟 ⌉.
for common configurations (see e.g. 𝑟 = 7.7 in Section 7).
Although the split approach enables near-continuous configura-
bility for many kinds of filters, the single-structure approach for
Ribbon filters has an advantage we call elasticity, for applications can be fetched in parallel, some testing shows this to be be relatively
like ElasticBF [46]. Like an Xor filter, one can drop entire columns expensive for 𝑟 > 2 for a filter that is not hot in memory cache.
from a Ribbon filter, for a corresponding higher FP rate. With Rib- Our preferred solution layout for Ribbon filters is interleaved
bon filters, we also have the ability to drop part of the last column, column-major layout (ICML), because it has locality very close
so a finished filter can be trimmed down with bit granularity. Gener- to row-major and decoding efficiency very close to column-major.
alizing further, a finished Ribbon filter can be split to several smaller The memory space is divided into conveniently sized ICML words,
structures with independent FP rates19 , with product as small as and grouped into blocks of 𝑟 words. Each block is the column-
the FP rate of the starting structure. Similarly, a finished Ribbon major layout of some contiguous rows of 𝑍 . See Figure 6, which
filter could be physically (re-)partitioned at arbitrary boundaries generalizes this layout to a mixed number of columns for fractional
by duplicating as little as (𝑤 − 1)𝑟 bits at each partition boundary 𝑟.
such that each query accesses only one partition. For ICML word size equal to 𝑤, at most and almost always two
words are combined for reconstructing each result bit. This means
5.2 Solution structure layout that the amount of adjacent memory accessed for a full query is
Here we examine memory layouts for the solution matrix 𝑍 ∈ 2𝑟𝑤 bits, while the ideal minimum is 𝑟𝑤 bits. For example, with
{0, 1}𝑚×𝑟 , which is critical for fast Ribbon filter queries. Prior 𝑟 = 6 and 𝑤 = 64, ICML accesses 768 bits per query, with 384-bit
work [22] only evaluated the 𝑟 = 1 case, where 𝑍 is a Boolean alignment, which translates to an average of accessing 2.25 Intel
(bit) vector. cache lines (512 bits) and essentially 1 page (4KB) per query; with
Xor filters conventionally use row-major layout of the solution only 6-bit “alignment,” row-major would access 1.5 cache lines per
structure, wherein the whole row 𝑖 of 𝑍 immediately precedes the query. A standard Xor filter accesses essentially 3 cache lines and
whole row 𝑖 + 1 in memory. A 𝑤 = 64 Ribbon filter combines nearly as many pages per query.
roughly an order of magnitude more rows, conditionally, than a Ribbon back-substitution is an especially fast, streaming opera-
standard Xor filter combines unconditionally (three rows). In fact, tion for layouts based on column major. We can buffer 𝑤 rows of 𝑍
with number of solution columns commonly 5 ≤ 𝑟 ≤ 15, standard in 𝑟 temporary values of width 𝑤, likely fitting in CPU registers,
Xor filters typically access more columns than rows, while Ribbon and use those buffers for (a) computing the logical previous bit for
filters typically access more rows than columns. Although row- each column, and (b) flushing to our solution structure for each 𝑤
major layout could likely be made efficient for Ribbon in some rows (𝑟𝑤 bits).
special cases using SIMD, we do not find it generally workable for As is well known for Bloom filters, queries can potentially be
a fast and highly-configurable filter. optimized with short-circuiting: returning from a “negative” query
The opposite is column-major layout, in which the entire col- as soon as a probed bit is zero, ensuring the query must return false.
umn 𝑖 of 𝑍 precedes column 𝑖 +1 in memory. Column-major is essen- A similar approach works for Ribbon filters using layouts based
tially ideal for querying a single result bit for a key, as we only have on column-major, returning as soon as a result bit does not match
to access a “contiguous” (usually unaligned) 𝑤 bits, bitwise-AND expectation. Although cache-local Bloom filters are so optimized
with 𝑐 (𝑥), and get the bit parity. The problem with column-major that this approach rarely pays off any longer [44], our Ribbon
is that accessing more result bits is not an adjacent memory access. implementation uses short-circuiting except for compile-time fixed
Although the several memory addresses are easily computed and 𝑟 ≤ 4. The distinction is visible in observed query time ranges in
Section 7.
19 ForHomogeneous Ribbon, the portion of the FP rate from degradation is not We also like the clean configurability of layouts based on column-
independent. major. Parameter 𝑟 should be freely chosen to balance FP rate vs.
8
Ribbon filter: practically smaller than Bloom and Xor

space usage, and that choice is much more free when it does not af- success; almost sure construction success with Standard Ribbon
fect instruction-level data alignment, only alignment in CPU caches requires significantly more space overhead than 5% failure chance,
and pages, which we consider a relatively minor concern. Ribbon- a good space-time trade-off in our judgment.
width ICML is good for using ⌊𝑟 ⌋ columns before ⌈𝑟 ⌉ columns Although Standard Ribbon does not scale infinitely for fixed
(Section 5.1), because upon determining the starting memory lo- ribbon width and space overhead, it is more space-efficient than
cation and (smallest) applicable number of columns, which can be standard Xor (23% overhead) for most practical 𝑛, which can be seen
done without conditional branches, the remaining query code does in Table 1 and Section 7. Unlike many other Gaussian structures,
not have to be aware of mixed numbers of columns; see Figure 6. construction speed is not a significant concern for scaling Ribbon
(⌈𝑟 ⌉ before ⌊𝑟 ⌋ is better for pure column major.) to large 𝑛.
A minor disadvantage of ICML is that the number 𝑚 of solution
rows must be a multiple of the number of bits in an ICML word, Scaling with sharding. There are many standard or obvious ways
which can present a conflict between configurability (accommodat- to construct a large, space-efficient filter from many smaller space-
ing any number of keys) and space efficiency for small 𝑛. efficient filters [37, 54]. Two ways of leveraging Ribbon features
are notable, but not evaluated in detail:
5.3 Practical hashing for Ribbon
• If uniformly hash-partitioning keys into fixed-size data struc-
Hash expansion. A filter structure “consuming” some quantity ture shards, the fractional 𝑟 feature of Ribbon (Section 5.1)
of hash information can operate from a smaller hash H (𝑥) of the can be used to accommodate variance in the number of keys
original key 𝑥 [28, 29]. The practical requirements for Ribbon filters mapped to each shard.
are these: • If determining hash ranges to assign to each shard, on-the-fly
• H (𝑥) values must be large enough to have an insignificant banding allows adding entries or buckets (in sharding hash
baseline FP rate due to full hash collisions, i.e. H (𝑥 1 ) = order, independent of start location order) until one fails and
H (𝑥 2 ) for 𝑥 1 ≠ 𝑥 2 . A 64-bit hash for H should suffice for starts the next shard. The “⟨add till failure⟩” rows in Table 1
almost all non-cryptographic applications, as having 232 correspond to this strategy of adding entries until one fails,
keys in a single filter incurs a baseline FP rate of just 2−32 . so yields good average space efficiency without construction
• H (𝑥) is effectively extended / expanded / remixed to what retries, such as < 1% overhead with 𝑚 = 210 per shard, not
is consumed20 . For Ribbon, it is most important to minimize including sharding metadata.
correlations between the starting location and other hash
consumers. A starting location computed with fastrange [45]
Shard sizes. Section 3.1 describes an inherent unlikelihood of fill-
on H (𝑥) relies primarily on upper bits, so multiplying H (𝑥)
ing all slots in a Ribbon system, even if 𝑚 = 𝑤, and how the likeli-
by a large odd constant (as with Knuth multiplicative hash
hood is similar with 𝑚 = Θ(𝑤 2 ). Because the expected number of
[42]) seems to suffice for removing correlation. Details are
empty slots at first failure to add remains constant even for small
in the reference implementation of Ribbon [25].
𝑚, the median proportion of unoccupied slots at failure decreases
Re-seeding. Some Ribbon designs need the ability to retry con- with 𝑚 before increasing with 𝑚, for a fixed ribbon width 𝑤. For
struction with sufficiently independent hashing to have an indepen- common ribbon widths, the minimum appears to be around 𝑤 2 /4,
dent probability of construction success. Observe that for Ribbon which we suggest is the natural shard size, subject to practical
filters a full hash collision does not interfere with construction suc- adjustment for the application.
cess (it only produces a redundant equation). We find in significant
testing that modifying an unseeded stock hash value with simple Soft sharding. With Ribbon we can apply sharding at a higher
XOR with a pseudorandom seed then multiplication by a large odd abstraction layer than memory space, for potentially better space
constant suffices for independent probability of construction suc- efficiency. In a typical hard sharding, construction optimizes for
cess. See [25] for details. Assuming uniform hashes, an effective each shard either (a) a set of keys, (b) a memory size, (c) a hash seed,
alternative to re-seeding on failed construction is simply to increase or (d) some other configuration parameters, based on the others,
𝑚 by a factor of 𝑤+1 and records the optimized configuration in some metadata. The
𝑤 .
change with soft sharding is that each shard is assigned a contigu-
5.4 Standard Ribbon scalability ous range of Ribbon start locations (from a single Ribbon system)
rather than a contiguous memory space (containing an indepen-
We refer to the non-homogeneous Ribbon construction, including
dent Ribbon system). This should be a pure win for space efficiency,
smash when appropriate, as Standard Ribbon. Construction fails
because the overlap of 𝑤 − 1 probing rows between adjacent soft
with some probability depending on 𝑚, 𝑛 and 𝑤, though we have no
shards allows them to, in effect, borrow some space from each other
formula. Table 1 provides some empirical data points for how much
without expending metadata. (A Standard or Homogeneous Rib-
configured space overhead, 𝜀 = (𝑚 − 𝑛)/𝑛, is required for several
bon filter is a naive soft sharding with no metadata guiding the
failure probabilities that represent different construction time vs.
shard assignments.) With some ordering constraints and temporary
solution space trade-offs. Observe that unlike standard Xor filters,
tracking data, the Ribbon algorithm allows us to backtrack, such
Ribbon does not exhibit sharp threshold behavior in construction
as for changing the hash seed within a shard or key assignments
20 In at
least two cases [1, 23], implementations citing an asymptotic result for efficient to shards. We do not analyze the “soft sharding” design space in
hashing in Bloom filters [40] had practical flaws that previous work [28] warned about. detail, but use the idea for Balanced Ribbon.
9
Peter C. Dillinger and Stefan Walzer

Table 1: Standard Ribbon space overhead 𝜀 from empirical data

Ribbon Failure 𝑤/2 smash 𝑤/4 smash 0 smash Each additional


width probability 𝑚 = 210 𝑚 = 210 𝑚 = 210 𝑚 = 214 𝑚 = 217 𝑚 = 224 doubling of 𝑚
𝑤 = 128 0.5 0.2% 0.1% 1.0% 1.1% 2.2% 4.7% +0.38%
𝑤 = 128 ⟨add till failure⟩ 0.2% 0.2% 1.1% 1.2% 2.3% 4.8% +0.38%
𝑤 = 128 0.05 0.5% 0.5% 2.2% 2.6% 3.7% 5.9% +0.38%
𝑤 = 128 0.001 1.1% 1.2% 4.1% 4.6% 5.8% > 7% +0.38%
𝑤 = 64 0.5 0.3% 0.4% 2.0% 3.8% 6.3% 11.7% +0.83%
𝑤 = 64 ⟨add till failure⟩ 0.8% 0.8% 2.2% 4.1% 6.5% 12.1% +0.83%
𝑤 = 64 0.05 3.7% 2.9% 4.8% 7.0% 9.4% 15.0% +0.83%
𝑤 = 64 0.001 11.4% 7.1% 9.2% 11.5% 13.8% > 19% +0.83%
𝑤 = 32 ⟨add till failure⟩ 5.9% 5.2% 6.3% 13.2% 19.2% 35.2% +2%
𝑤 = 16 ⟨add till failure⟩ 28.9% 27.0% 27.5% 67.3% > 100% ≫ 100% +??%

6 BALANCED RIBBON

𝛼 𝛼
Balanced Ribbon is an experimental design for scaling and space-

100%
optimizing Standard Ribbon; for implementation details see [26].

100%
We intend Balanced Ribbon as an example in the design space
opened up by the new on-the-fly and incremental Gaussian elimina-
𝛼
tion algorithm, and encourage follow-up work to explore, optimize, Level
1 2 3 ℓ=4
and analyze this design space.
Shard
Balanced Ribbon extends Standard Ribbon with soft sharding 1 2 3 4 5 6 7 𝑠=8
and a new balanced allocation scheme tailored to this domain (re-
lated: [4, 6, 7, 17, 49, 58]). Like many other hashing schemes, we
start with the idea that each entry has two possible locations in Figure 7: Bumping behavior between levels of Balanced
the Ribbon, given by two hashes: an earlier primary location and Ribbon. The expected relative quantity of entries with pri-
a later secondary location in a distinct shard. The shard with the mary locations in each level is indicated by where
primary location is constructed before the shard with the secondary are bumped and added in their secondary location . Not
location and accommodates the key if possible. If not, we say the shown: allocation overheads (𝜀), Poisson variances, bucket
key is “bumped” and must be accommodated in its secondary shard, boundaries, and dispersion within each shard.
so shards add “bumped” entries first for best chance of success.
Metadata is constructed to indicate which keys are bumped21 . The
construction is greedy in that a shard tries to accommodate as many
keys as possible, without considering where keys will be bumped to, perfectly, the average number of entries bumped from each level 1
and constructed shards are never revisited. Difficulty arises if using shard will be 𝛼𝑛/𝑠. Because level 2 shards are half as many, they
two uniform hashes, the smaller primary and the larger secondary: receive 2𝛼𝑛/𝑠 entries on average for adding in secondary location.
later shards are dominated by entries in their secondary location. With those bumped entries, level 2 shards are now overloaded to
Organizing shards for bumping. To make this work we orga- relative weight 1 + 𝛼 compared to 1 − 𝛼 for later shards. With this,
nize the soft shards into levels 1..ℓ, with level 𝑖 containing exactly we have a recursive structure to ensure a continuous supply of
⌈2ℓ−𝑖−1 ⌉ shards, so we assume a power of two number of shards entries eligible for bumping down to the last shard. (Like spatial
overall, 𝑠 = 2ℓ−1 . Unlike some multi-level hashing schemes [12, coupling [57] and “always go left” [56], we are making productive
41, 43], an entry’s primary location can be on any level 𝑖, with its use of less randomness.)
secondary location uniformly on level min(𝑖 + 1, ℓ). Because no The last shard (level ℓ) is different but does not need to be com-
secondary locations are in level 1, we overload it with primary plicated. If we configure our allocated space overhead assuming
locations; level 1 shards have relative weight 1 + 𝛼 for primary loca- Standard Ribbon overheads for the last shard, along with tighter
tions and other level shards have weight 1 − 𝛼. See√︁Figure 7. With overheads for the other shards, it seems to work (single shard Bal-
average 𝑛/𝑠 keys per shard, we choose 𝛼 ≈ 3.5/ 𝑛/𝑠 to ensure anced Ribbon ≡ Standard Ribbon). For large number of shards, we
a sufficient supply of entries even for shards with three Poisson observe the Balanced Ribbon final shards either completely over-
standard deviations below the mean number of entries. (We use whelmed with bumped entries (construction failure) or receiving
𝛼 = 1/8 for 𝑛/𝑠 ≈ 1000.) Assuming we allocate our space overhead almost no bumped entries. We believe this is because the overall
variance in utilization of slots in all prior shards is concentrated
21 Thereis no such disambiguation in [21] and a query would combine information into the last shards, and that variance is large relative to a single
from both locations. This significantly complicates the linear system, however. shard. Because the variance is small overall with a large number of
10
Ribbon filter: practically smaller than Bloom and Xor

shards, construction success is more predictable at scale (apparent Table 2: Experimental performance comparisons
threshold behavior).
Space ns/key, 𝑛 = 106 ns/key, 𝑛 = 108
Buckets for bumping. We use another (semi-)independent hash
Configuration ovr % con query con query
to order or group keys within a shard strictly for bumping. Selecting
a threshold on that hash works well for both Ribbon and metadata ↓ FP rate around 1%, Ribbons using 𝑟 = 7 ↓
space efficiency. First, we have chosen our shard size such that
BlockedBloom[44] 52.0 11 14 ± 0 32 37 ± 0
controlling the number of entries going into a shard is much more
BlockedBloom[24] 49.8 21 10 ± 0 72 36 ± 0
important than the particular set of entries. Second, a Θ(log(𝑛/𝑠))
Cuckoo12† 46.2 69 21 ± 0 147 58 ± 0
bit threshold value is small metadata per shard. However, we do
Cuckoo12 40.3 91 20 ± 0 205 58 ± 0
not want to incur the CPU time for sorting entries into so many
Morton12 40.5 87 55 ± 3 106 125 ± 13
buckets per shard.
Xor, 𝑟 = 7‡ 23.0 195 21 ± 0 264 66 ± 1
Instead of a threshold we use Θ(log(𝑛/𝑠)) buckets of geomet-
Xor, 𝑟 = 8 23.0 148 15 ± 0 211 50 ± 0
rically distributed sizes that can be selected independently, using
Xor+, 𝑟 = 8 14.5 171 35 ± 1 299 104 ± 10
the incremental feature of Ribbon to backtrack on failed buckets.
Homog., 𝑤 = 16 52.0 56 40 ± 6 101 88 ± 6
Using some bit tricks to approximate a geometric distribution with
Homog., 𝑤 = 32 20.6 58 38 ± 6 116 85 ± 5
𝑝 = 2−0.5 seems better than 𝑝 = 2−1 , perhaps due to variance in
Standard, 𝑤 = 64 14;20 71 42 ± 5 130 94 ± 7
actual bucket sizes. We prefer 8 buckets per shard for 8 bits of meta-
Homog., 𝑤 = 64 10.1 83 39 ± 7 160 90 ± 6
data per shard, 0.008 bits per key for common shard size. A subtle
Standard, 𝑤 = 128 6;8 166 58 ± 2 235 140 ± 26
part of maximizing space efficiency with independent buckets and
Homog., 𝑤 = 128 5.1 164 53 ± 3 270 145 ± 25
soft sharding is to attempt adding a larger (in expectation) bucket
Balanced, 𝑤 = 32† 15.3 84 47 ± 6 278 104 ± 5
in shard 𝑖 + 1 before attempting to add a smaller (in expectation)
Balanced, 𝑤 = 32 2.5 162 48 ± 5 372 107 ± 5
bucket in shard 𝑖, if the two shards are in the same level. Successfully
Balanced, 𝑤 = 64 0.7 292 49 ± 5 516 111 ± 5
adding the smaller could overflow enough to make it impossible to
Balanced, 𝑤 = 128 0.3 985 66 ± 1 1335 162 ± 29
add the larger; on average, the better greedy choice is trying to add
the larger bucket first. For CPU efficiency, we skip attempts to add Homog.32, 𝑟 = 7.0 20.6 58 38 ± 6 116 85 ± 5
a bucket that is very likely to fail given the number of successful Homog.32, 𝑟 = 7.7 22.7 61 43 ± 4 120 96 ± 7
additions to the shard. Homog.32, 𝑟 = 8.0 22.1 61 39 ± 5 116 86 ± 9

Overall. Balanced Ribbon construction resembles an external sort ↓ FP rate around 10%, Ribbons using 𝑟 = 3 ↓
between CPU cache and main memory. Queries depend on just Xor, 𝑟 = 3‡ 23.1 194 16 ± 0 264 55 ± 0
one bit of sharding metadata, out of a typical 8 bits per shard. Homog., 𝑤 = 16 34.6 46 23 ± 0 91 66 ± 1
Space usage for 𝑤 = 64 Balanced Ribbon is 1.005𝜆 + 0.008 bits per Homog., 𝑤 = 32 16.1 52 21 ± 0 108 55 ± 1
key, relative to the information-theoretic lower bound 𝜆, as shown Standard, 𝑤 = 64 14;20 66 22 ± 0 124 62 ± 0
in Figure 5. We have tested Balanced Ribbon with these space Homog., 𝑤 = 64 8.0 71 21 ± 0 152 72 ± 4
efficiencies up to 4 billion keys; arbitrary scaling might require Standard, 𝑤 = 128 6;8 145 35 ± 0 213 79 ± 1
natural increases in 𝑤 (𝑤 ∼ log 𝑛 and thus 𝑛/𝑠 ∼ log2 𝑛) or might Homog., 𝑤 = 128 4.2 156 33 ± 0 266 79 ± 1
benefit from design changes. Balanced, 𝑤 = 32 2.8 155 29 ± 0 377 68 ± 1
Balanced, 𝑤 = 64 0.8 288 29 ± 0 517 84 ± 11
7 EXPERIMENTS ↓ FP rate around 2−11 , Ribbons using 𝑟 = 11 ↓

Setup. For validation we extend the experimental setup used for Cuckoo16† 36.0 68 20 ± 0 147 59 ± 0
Cuckoo and Xor filters [35, 38, 39], with our code available in our Cuckoo16 30.2 84 20 ± 0 201 59 ± 1
fork on GitHub [26]. Timings are performed on a single-socket CuckooSemiSort 26.6 146 50 ± 0 323 143 ± 4
Intel® Xeon® D-2191 (Skylake DE) with 64GB RAM. Tests are com- Xor, 𝑟 = 12 23.1 175 23 ± 0 251 74 ± 4
piled with GCC 8.4.1, using g++ -O3 -DNDEBUG -march=skylake- Xor, 𝑟 = 10.666‡ 23.0 191 20 ± 0 284 61 ± 0
avx512. The Ribbon code is portable C++ using no processor intrin- Xor+, 𝑟 = 11‡ 12.8 185 38 ± 0 317 107 ± 2
sics but using compiler built-ins for prefetch, count leading/trailing Homog., 𝑤 = 32 28.0 74 51 ± 2 129 118 ± 32
zero bits, and bit parity. Standard, 𝑤 = 64 14;20 78 54 ± 3 139 138 ± 41
Homog., 𝑤 = 64 12.7 89 51 ± 3 163 118 ± 26
Results. First, Figure 1 shows which approach is fastest for various Standard, 𝑤 = 128 6;9 173 71 ± 9 247 164 ± 39
space overheads and various FP rates, and query and construction Homog., 𝑤 = 128 7.3 189 72 ± 9 292 166 ± 43
times for the corresponding overall fastest approach. It uses a mix Balanced, 𝑤 = 32 2.3 169 60 ± 2 394 142 ± 32
of 𝑛 ∈ {106, 107, 108 }; more details in Section 1. To maximize the Balanced, 𝑤 = 64 0.5 301 61 ± 3 536 147 ± 28
coverage of “BlockedBloom” in the figure, we include an AVX2
† Larger space allocated to improve construction time.
SIMD-optimized implementation from RocksDB [24] that sacrifices
‡ Potentially unfavorable bit alignment.
SIMD-optimized construction for enhanced configurability (any
; Standard Ribbon space overhead depends on 𝑛.
number of probes) and minimized FP rate compared to [44]; all
11
Peter C. Dillinger and Stefan Walzer

“BlockedBloom” in this paper use aligned 512-bit blocks. To maxi- as small), only constant size metadata is read to determine which
mize coverage of “Bloom” we include an obvious variant splitting memory to fetch to complete a query. Although the test relies on
probes between two independent blocks. out-of-order execution between queries for memory prefetching,
Second, Table 2 shows detailed timings for some specific config- applications can use explicit prefetching.22
urations with approximate 𝑓 ∈ {2−3, 2−7, 2−11 }. Construction and To minimize sampling noise, the test machine was otherwise
query times are given in nanoseconds per key over at least 5 · 107 idle. This does not match a production environment, but we have
keys, with a range of query times for different ratios of positive not seen a significant difference in relative results when running
vs. negative queries. The timing data for 𝑛 = 106 represents the the tests under load.
core CPU time of each approach with negligible memory access To match existing test code, hash seeds are not configurable.
overheads, while 𝑛 = 108 includes memory access overheads. The All structures are configured for high chance of construction suc-
𝑟 (1+𝜀)
reported space overhead is log (𝑓 −1 ) − 1 where the FP rate 𝑓 is cess (roughly 99% or more). In practice, approaches other than
2
measured by sampling. Standard Ribbon has distinct overheads for BlockedBloom and Homogeneous would incur some small addi-
the different 𝑛. Cuckoo12 and Morton12 use 12 bits per cell [11, 35], tional overheads associated with seeding and retries.
for similar FP rate as 𝑟 = 9 Xor or Ribbon.
Third, Table 1 shows many more space overheads for Standard 8 CONCLUSION
Ribbon, which mostly improve for smaller 𝑛. While we do not Our result changes the narrative around data structures constructed
include timings for these smaller 𝑛, we can infer that Standard with Gaussian elimination vs. with peeling. With our new algorithm
Ribbon, with or without smash, takes territory from other Ribbon and the query structure from [22], Gaussian elimination can be
approaches in Figure 1 (a) for smaller 𝑛. faster than peeling while also opening up better space efficiency.
On this foundation we have built Ribbon filters with the follow-
Observations. When saving space compared to Bloom and Cuckoo
ing efficiently scaling variants.
(incl. Morton), we see in Table 2,
• The Homogeneous Ribbon filter is simple and has a construc-
• Ribbon can achieve much lower space overheads than the tion algorithm that never fails. We provide a full analysis.
practical alternatives. 𝑤 = 64 Balanced Ribbon has less than • The Balanced Ribbon filter leverages the on-the-fly and in-
1% overhead except when 𝑟 < 3 (bottom right of Figure 1). cremental construction algorithm in an experimental load-
• Ribbon mostly wins in construction time, sometimes by a balancing scheme that further reduces space overhead.
large factor. Ribbon construction times are generally only
higher than Xor when saving space compared to Xor. As with Bloom filters, the true practicality of Ribbon also comes
• Xor mostly wins in query time. For smaller 𝑟𝑤, which is from being able to configure it for or adapt it to the application,
proportional to the contiguous memory loaded per query, including dynamic conditions. This includes using any amount of
Ribbon query times are similar to Xor. Ribbon query times space to efficiently represent any number of keys, even by shrinking
increase with 𝑟 and/or 𝑤. the structure after construction. Ribbon filters replicate the smooth
• Ribbon has a clear advantage for configurability (aside from FP-rate-for-space configurability of Bloom filters and extend that
Bloom). Xor and Cuckoo incur a measurable penalty when with configurability between space efficiency and time efficiency.
cells are not aligned to a friendly size, such as 4, 8, 12, or
16 bits, and have limited options for partial-bit granularity. Future work could pursue the following goals.
Ribbon performance is continuous for any integer 𝑟 , and the • Deepen the theoretical understanding of Ribbon filters to
penalty for arbitrary fractional average 𝑟 is small . better guide parameter choices in practice.
• Ribbon can easily trade space efficiency for construction • Manage the variance of FP rate of Homogeneous Ribbon
time efficiency (like Cuckoo), by increasing allocated space filters at small scales.
overhead; see † vs. corresponding non-† configurations. Xor • Further explore the design space surrounding Balanced Rib-
construction time is more fixed. bon for approaches that are better and/or easier to analyze.
• Create a SIMD-optimized implementation of Ribbon queries,
A somewhat surprising result is that 𝑤 = 32 Balanced Ribbon
perhaps using AVX512 POPCNT and/or 8-bit ICML rather
is sometimes faster and more space efficient than other Ribbon
than 𝑤-bit ICML.
variants with 𝑤 = 128. We can reason that the benefits of smaller
• Explore how competitive Ribbon can be as a data structure
ribbon width sometimes exceed the costs associated with balancing.
for static functions, including cases where the number 𝑟 of
We consider 𝑤 = 128 Balanced Ribbon “impractical” because the
solution columns is quite high, e.g. 𝑟 ≥ 32.
execution time is much higher for a tiny benefit in space usage.
Limitations. A notable limitation of this test is that it tests query ACKNOWLEDGMENTS
throughput more than query latency. Like executing a batch of filter We thank Martin Dietzfelbinger for early contributions to this line
queries, the test does not depend on the result of each query for of research, and Peter Sanders for later discussions. We thank others
what to do next. For applications executing a single filter query with at Facebook for their supporting roles, including Jay Zhuang, Siying
unpredictable outcome, main memory latency can be 200-300ns. Dong, Shrikanth Shankar, Affan Dar, and Ramkumar Vadivelu.
However, we expect latency to be similar between the competing
approaches, because aside from Balanced Ribbon sharding meta- 22 RocksDB MultiGet uses batched filter queries with explicit prefetching; RocksDB
data (very small, cachable) and Xor+ compression metadata (not Get uses single filter queries.
12
Ribbon filter: practically smaller than Bloom and Xor

REFERENCES [24] Peter C. Dillinger. 2019. RocksDB FastLocalBloom. [Link]


[1] Paulo Sérgio Almeida. 2020. A Case for Partitioned Bloom Filters. rocksdb/blob/master/util/bloom_impl.h
arXiv:2009.11789 [[Link]] [25] Peter C. Dillinger. 2020. ribbon_*.{cc,h} in RocksDB. [Link]
[2] Raja Appuswamy, Goetz Graefe, Renata Borovica-Gajic, and Anastasia Ailamaki. facebook/rocksdb/
2019. The five-minute rule 30 years later and its impact on the storage hierarchy. [26] Peter C. Dillinger. 2021. Fork of fastfilter_cpp. [Link]
Commun. ACM 62, 11 (2019), 114–120. [Link] fastfilter_cpp
[3] Martin Aumüller, Martin Dietzfelbinger, and Michael Rink. 2009. Experimental [27] Peter C. Dillinger et al. 2021. Bloom filter on RocksDB Wiki. [Link]
Variations of a Theoretically Good Retrieval Data Structure. In Proc. 17th ESA. facebook/rocksdb/wiki/RocksDB-Bloom-Filter
742–751. [Link] [28] Peter C. Dillinger and Panagiotis Manolios. 2004. Bloom Filters in Probabilis-
[4] Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli Upfal. 1999. Balanced tic Verification. In Formal Methods in Computer-Aided Design, 5th International
Allocations. SIAM J. Comput. 29, 1 (1999), 180–200. [Link] Conference, FMCAD 2004, Austin, Texas, USA, November 15-17, 2004, Proceedings
S0097539795288490 (Lecture Notes in Computer Science), Alan J. Hu and Andrew K. Martin (Eds.),
[5] Michael A. Bender, Martin Farach-Colton, Rob Johnson, Russell Kraner, Bradley C. Vol. 3312. Springer, 367–381. [Link]
Kuszmaul, Dzejla Medjedovic, Pablo Montes, Pradeep Shetty, Richard P. Spillane, [29] Peter C. Dillinger and Panagiotis Manolios. 2004. Fast and Accurate Bitstate
and Erez Zadok. 2012. Don’t Thrash: How to Cache Your Hash on Flash. Proc. Verification for SPIN. In Model Checking Software, 11th International SPIN Work-
VLDB Endow. 5, 11 (2012), 1627–1637. [Link] shop, Barcelona, Spain, April 1-3, 2004, Proceedings (Lecture Notes in Computer
[6] Petra Berenbrink, Tom Friedetzky, Zengjian Hu, and Russell Martin. 2008. On Science), Susanne Graf and Laurent Mounier (Eds.), Vol. 2989. Springer, 57–75.
Weighted Balls-into-Bins Games. Theor. Comput. Sci. 409, 3 (Dec. 2008), 511–520. [Link]
[Link] [30] Peter C. Dillinger and Panagiotis Manolios. 2009. Fast, All-Purpose State Storage.
[7] Petra Berenbrink, Kamyar Khodamoradi, Thomas Sauerwald, and Alexandre In Model Checking Software, 16th International SPIN Workshop, Grenoble, France,
Stauffer. 2013. Balls-into-Bins with Nearly Optimal Load Distribution. In Proceed- June 26-28, 2009. Proceedings (Lecture Notes in Computer Science), Corina S. Pasare-
ings of the Twenty-Fifth Annual ACM Symposium on Parallelism in Algorithms and anu (Ed.), Vol. 5578. Springer, 12–31. [Link]
Architectures (Montréal, Québec, Canada) (SPAA ’13). Association for Comput- 2_6
ing Machinery, New York, NY, USA, 326–335. [Link] [31] Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. Evolution of
2486191 Development Priorities in Key-value Stores Serving Large-scale Applications: The
[8] Valerio Bioglio, Marco Grangetto, Rossano Gaeta, and Matteo Sereno. 2010. On RocksDB Experience. In 19th USENIX Conference on File and Storage Technologies
the Fly Gaussian Elimination for LT Codes. Communications Letters, IEEE 13 (01 (FAST 21). USENIX Association. [Link]
2010), 953 – 955. [Link] presentation/dong
[9] Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable [32] David Eppstein. 2016. Cuckoo Filter: Simplification and Analysis. In Proc. 15th
Errors. Commun. ACM 13, 7 (1970), 422–426. [Link] SWAT. 8:1–8:12. [Link]
362692 [33] Jason Evans et al. 2019. jemalloc manual. [Link]
[10] Fabiano Cupertino Botelho, Rasmus Pagh, and Nivio Ziviani. 2013. Practical Version 5.2.1.
Perfect Hashing in Nearly Optimal Space. Inf. Syst. 38, 1 (2013), 108–131. https: [34] Bin Fan, David G. Andersen, and Michael Kaminsky. 2013. Cuckoo Filter: Better
//[Link]/10.1016/[Link].2012.06.002 Than Bloom. ;login: 38, 4 (2013), 5. [Link]
[11] Alex D. Breslow and Nuwan Jayasena. 2020. Morton filters: fast, compressed august-2013-volume-38-number-4/cuckoo-filter-better-bloom
sparse cuckoo filters. VLDB J. 29, 2-3 (2020), 731–754. [Link] [35] Bin Fan, David G. Andersen, Michael Kaminsky, and Michael Mitzenmacher.
s00778-019-00561-0 2014. Cuckoo Filter: Practically Better Than Bloom. In Proc. 10th CoNEXT. 75–88.
[12] Andrei Z. Broder and Anna R. Karlin. 1990. Multilevel Adaptive Hashing. In [Link]
Proceedings of the First Annual ACM-SIAM Symposium on Discrete Algorithms, [36] Michael L. Fredman and János Komlós. 1984. On the Size of Separating Systems
22-24 January 1990, San Francisco, California, USA, David S. Johnson (Ed.). SIAM, and Families of Perfect Hash Functions. SIAM Journal on Algebraic Discrete
43–53. [Link] Methods 5, 1 (1984), 61–68. [Link]
[13] Andrei Z. Broder and Michael Mitzenmacher. 2003. Network Applications of [37] Marco Genuzio, Giuseppe Ottaviano, and Sebastiano Vigna. 2016. Fast Scalable
Bloom Filters: A Survey. Internet Mathematics 1, 4 (2003), 485–509. https: Construction of (Minimal Perfect Hash) Functions. In Proc. 15th SEA. 339–352.
//[Link]/10.1080/15427951.2004.10129096 [Link]
[14] Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004. The Bloomier [38] Thomas Mueller Graf and Daniel Lemire. 2019. fastfilter_cpp. [Link]
Filter: An Efficient Data Structure for Static Support Lookup Tables. In Proc. 15th com/FastFilter/fastfilter_cpp
SODA. 30–39. [Link] [39] Thomas Mueller Graf and Daniel Lemire. 2020. Xor Filters: Faster and Smaller
[15] John G. Cleary. 1984. Compact Hash Tables Using Bidirectional Linear Probing. Than Bloom and Cuckoo Filters. ACM J. Exp. Algorithmics 25 (2020), 1–16.
IEEE Trans. Computers 33, 9 (1984), 828–834. [Link] [Link]
1676499 [40] Adam Kirsch and Michael Mitzenmacher. 2008. Less hashing, same performance:
[16] Colin Cooper. 2000. On the rank of random matrices. Random Structures & Building a better Bloom filter. Random Struct. Algorithms 33, 2 (2008), 187–218.
Algorithms 16, 2 (2000), 209–232. [Link] [41] A. Kirsch and M. Mitzenmacher. 2010. The Power of One Move: Hashing Schemes
16:2<209::AID-RSA6>[Link];2-1 for Hardware. IEEE/ACM Trans. Netw. 18 (2010), 1752–1765.
[17] Artur Czumaj, Chris Riley, and Christian Scheideler. 2003. Perfectly Balanced [42] Donald Ervin Knuth. 1998. Sorting and Searching (2nd ed.). Addison-Wesley.
Allocation. In Proc. 6th APPROX - 7th RANDOM. 240–251. [Link] [Link]
1007/978-3-540-45198-3_21 [43] Sailesh Kumar, Jonathan S. Turner, and Patrick Crowley. 2008. Peacock Hashing:
[18] Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Deterministic and Updatable Hashing for High Performance Networking. In
Navigable Key-Value Store. In Proceedings of the 2017 ACM International Confer- INFOCOM 2008. 27th IEEE International Conference on Computer Communications,
ence on Management of Data (Chicago, Illinois, USA) (SIGMOD ’17). Association Joint Conference of the IEEE Computer and Communications Societies, 13-18 April
for Computing Machinery, New York, NY, USA, 79–94. [Link] 2008, Phoenix, AZ, USA. IEEE, 101–105. [Link]
3035918.3064054 [44] Harald Lang, Thomas Neumann, Alfons Kemper, and Peter A. Boncz. 2019.
[19] Martin Dietzfelbinger, Andreas Goerdt, Michael Mitzenmacher, Andrea Monta- Performance-Optimal Filtering: Bloom overtakes Cuckoo at High-Throughput.
nari, Rasmus Pagh, and Michael Rink. 2010. Tight Thresholds for Cuckoo Hashing Proc. VLDB Endow. 12, 5 (2019), 502–515. [Link]
via XORSAT. In Proc. 37th ICALP (1). 213–225. [Link] 3303757
14165-2_19 [45] Daniel Lemire. 2019. Fast Random Integer Generation in an Interval. ACM Trans.
[20] Martin Dietzfelbinger and Rasmus Pagh. 2008. Succinct Data Structures for Model. Comput. Simul. 29, 1 (2019), 3:1–3:12. [Link]
Retrieval and Approximate Membership (Extended Abstract). In Proc. 35th ICALP [46] Yongkun Li, Chengjin Tian, Fan Guo, Cheng Li, and Yinlong Xu. 2019. Elas-
(1). 385–396. [Link] ticBF: Elastic Bloom Filter with Hotness Awareness for Boosting Read Perfor-
[21] Martin Dietzfelbinger and Stefan Walzer. 2019. Constant-Time Retrieval with mance in Large Key-Value Stores. In 28th USENIX, Dahlia Malkhi and Dan Tsafrir
𝑂 (log 𝑚) Extra Bits. In Proc. 36th STACS. 24:1–24:16. [Link] (Eds.). USENIX Association, 739–752. [Link]
[Link].2019.24 presentation/li-yongkun
[22] Martin Dietzfelbinger and Stefan Walzer. 2019. Efficient Gauss Elimination for [47] Siqiang Luo, Subarna Chatterjee, Rafael Ketsetsidis, Niv Dayan, Wilson Qin,
Near-Quadratic Matrices with One Short Random Block per Row, with Applica- and Stratos Idreos. 2020. Rosetta: A Robust Space-Time Optimized Range Fil-
tions. In Proc. 27th ESA. 39:1–39:18. [Link] ter for Key-Value Stores. In Proceedings of the 2020 International Conference
[23] Peter C. Dillinger. 2018. RocksDB Issue #4120. [Link] on Management of Data, SIGMOD Conference 2020, online conference [Portland,
rocksdb/issues/4120 OR, USA], June 14-19, 2020, David Maier, Rachel Pottinger, AnHai Doan, Wang-
Chiew Tan, Abdussalam Alawini, and Hung Q. Ngo (Eds.). ACM, 2071–2086.
[Link]
13
Peter C. Dillinger and Stefan Walzer

[48] Yoshinori Matsunobu, Siying Dong, and Herman Lee. 2020. MyRocks: LSM-Tree [54] Felix Putze, Peter Sanders, and Johannes Singler. 2009. Cache-, hash-, and space-
Database Storage Engine Serving Facebook’s Social Graph. Proc. VLDB Endow. efficient Bloom filters. ACM Journal of Experimental Algorithmics 14 (2009), 18.
13, 12 (Aug. 2020), 3217–3230. [Link] [Link]
[49] Michael Mitzenmacher. 2001. The Power of Two Choices in Randomized Load [55] Kapil Vaidya, Eric Knorr, Tim Kraska, and Michael Mitzenmacher. 2020. Parti-
Balancing. IEEE Trans. Parallel Distributed Syst. 12, 10 (2001), 1094–1104. https: tioned Learned Bloom Filter. CoRR abs/2006.03176 (2020), 13. arXiv:2006.03176
//[Link]/10.1109/71.963420 [Link]
[50] Patrick E. O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O’Neil. 1996. [56] Berthold Vöcking. 2003. How Asymmetry Helps Load Balancing. J. ACM 50, 4
The Log-Structured Merge-Tree (LSM-Tree). Acta Inf. 33, 4 (1996), 351–385. (2003), 568–589. [Link]
[Link] [57] Stefan Walzer. 2021. Peeling Close to the Orientability Threshold: Spatial Cou-
[51] Anna Pagh, Rasmus Pagh, and S. Srinivasa Rao. 2005. An Optimal Bloom Filter pling in Hashing-Based Data Structures. In Proc. 32nd SODA.
Replacement. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium [58] Udi Wieder. 2017. Hashing, Load Balancing and Multiple Choice. Foundations
on Discrete Algorithms (Vancouver, British Columbia) (SODA ’05). Society for and Trends® in Theoretical Computer Science 12, 3–4 (2017), 275–379. https:
Industrial and Applied Mathematics, USA, 823–829. //[Link]/10.1561/0400000070
[52] Boris Pittel and Gregory B. Sorkin. 2016. The Satisfiability Threshold for 𝑘 - [59] Huanchen Zhang, Hyeontaek Lim, Viktor Leis, David G. Andersen, Michael
XORSAT. Combinatorics, Probability & Computing 25, 2 (2016), 236–268. https: Kaminsky, Kimberly Keeton, and Andrew Pavlo. 2018. SuRF: Practical Range
//[Link]/10.1017/S0963548315000097 Query Filtering with Fast Succinct Tries. In Proceedings of the 2018 International
[53] Ely Porat. 2009. An Optimal Bloom Filter Replacement Based on Matrix Solving. Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA,
In Proc. 4th CSR. 263–273. [Link] June 10-15, 2018, Gautam Das, Christopher M. Jermaine, and Philip A. Bernstein
(Eds.). ACM, 323–336. [Link]

14

Common questions

Powered by AI

Configurability in Ribbon filters refers to their ability to efficiently utilize any amount of space to minimize false positive rates (FP rates) irrespective of the size of the key set. This trait makes Ribbon filters exceptionally adaptable in dynamic environments where the number of keys is unpredictable . Unlike conventional Bloom alternatives that rely on fixed number of whole bits per cell, Ribbon filters can adjust space allocations continuously, thus offering fine-tuned configurability. This includes the flexibility of dropping portions of a filter for a higher FP rate or splitting from a completed filter into smaller structures to accommodate changing requirements . Such elastic properties make Ribbon filters highly applicable in environments with fluctuating demands .

Ribbon filters offer enhanced space efficiency compared to traditional Bloom alternatives due to their smooth FP-rate-for-space configurability and elasticity in space usage. Unlike Bloom filters that use fixed bit sizes per cell, Ribbon filters can handle fractional-bit cell sizes by splitting available space between two structures or within a single structure . This flexibility allows better space efficiency by utilizing any amount of space effectively. Moreover, the configurability extends to trade-offs between space efficiency and time efficiency, making Ribbon filters adaptable to various applications . However, smaller Ribbon widths like w=32 occasionally outperform larger ones like w=128, as seen in specific scenarios with Balanced Ribbon filters, indicating space efficiency can sometimes be better with narrower configurations .

Using fractional-bit cell sizes in filters like Ribbon and Xor allows for a finer allocation of space that closely matches the requirements for minimizing FP rates, thus reducing overall space overhead. This is typically achieved by creating configurations that employ a weighted average of bit allocations per cell or by splitting available space into dual structures, as seen with 64/i bits per cell configurations in Xor filters . For example, adopting a fractional configuration increases space overhead only slightly, around 1% for common setups, while maintaining a low FP rate . The ability to adjust cell sizes fractionally enables a continuum of FP-rate vs. space trade-offs, making these filters highly adaptable to specific spatial constraints .

Ribbon filters can be partitioned to enhance configurability by splitting the finished filter into several smaller structures, each with independent FP rates, while maintaining the product close to the initial FP rate . This partitioning can be done by duplicating minimal necessary bits at each boundary, allowing each partition to function independently or adapt to different requirements . This approach offers the advantage of reducing the filter's footprint to fit exact space constraints while maintaining performance. Partitioning also allows for reallocating space based on real-time demand, optimizing the use of available resources . Such flexibility is particularly beneficial in systems that need to dynamically adjust to varying loads or storage requirements, providing elasticity in application operations .

The trade-off between space efficiency and construction time efficiency in Ribbon filters is crucial for their practical application. Ribbon filters are designed to allow a trade-off similar to what Cuckoo filters offer; they can trade more allocated space for faster construction times . For instance, while a w=32 configuration might offer a balance between speed and space usage, a wider ribbon like w=128 can become impractical due to significantly higher execution time for only marginal space benefits . This trade-off allows Ribbon filters to be configured dynamically based on the specific constraints of space availability and required speed, providing flexibility for applications with varying performance requirements .

Balanced Ribbon filters achieve space overhead reduction through an on-the-fly incremental construction algorithm that balances load across the filter, thereby minimizing unused space during the allocation process . This technique helps distribute the construction effort evenly, much like a balancing scheme, to prevent concentrated areas of high load, which can otherwise increase space overhead. One limitation of using Balanced Ribbon filters is that their construction time can be significantly higher compared to standard Ribbon filters, especially when larger widths are used, such as w=128, which become impractical due to lengthy execution times for minimal space benefits . Furthermore, these filters may also introduce complexities in maintaining configurability under varying conditions, limiting their broader application .

Selecting the optimal epsilon (𝜀) for minimizing space overhead in Homogeneous Ribbon filters involves balancing between too large or too small values, as each extreme can lead to inefficiencies . A small 𝜀 can cause the false positive (FP) rate to escalate due to densely packed constraints, while a too large 𝜀 can waste space due to underutilization. In practice, simulations suggest that 𝜀 should be set around 4 + r/4w to achieve a reasonable space-overhead balance for given parameters like r and w . By optimizing 𝜀, the filter's performance can be fine-tuned to maintain desired FP rates while minimizing space usage, crucial for scalable applications .

The construction algorithm of the Homogeneous Ribbon filter is characterized by its simplicity and guarantee of never failing, a distinction from other filter types which might require retries or iterative checking . This reliability is achieved by employing an algorithmic framework that closely manages the filter's balance in real-time during construction, ensuring constraints are met without exceeding space limits. The design not only ensures completion without issues but also simplifies the process compared to other more complex filters that tackle Gaussian eliminations or iterative constraint management . This contributes to a high level of reliability, making the filter suitable for applications where consistent performance without errors is critical .

Gaussian elimination and peeling methods differ primarily in their approach to resolving dependencies within data structures. Gaussian elimination involves a mathematical approach to solving systems of linear equations, allowing for precise and often seamless construction of complex filter configurations . In contrast, peeling employs a more heuristic method, often iteratively removing layers until constraints are satisfied. The benefit of using Gaussian elimination for Ribbon filters is its potential for greater space efficiency and speed, a significant advantage over peeling methods. The precision of Gaussian elimination minimizes retries and unnecessary allocations, achieving optimal space utilization and faster execution times, which are particularly beneficial for applications requiring high throughput and minimal latency .

Future research directions identified for enhancing the efficiency and understanding of Ribbon filters include deepening the theoretical understanding to better guide parameter choices in practice, managing the variance of false positive rates of Homogeneous Ribbon filters at small scales, and further exploring the design space of Balanced Ribbon filters to improve analytical ease and performance . Additional avenues involve creating SIMD-optimized implementations of Ribbon queries and exploring Ribbon's viability as a data structure for static functions, where the number of solution columns is high, such as r ≥ 32 . These developments aim to refine the configurability, performance, and scalability of Ribbon filters for broader application scenarios.

You might also like