2013
DOI: 10.1002/spe.2198
|View full text |Cite
|
Sign up to set email alerts
|

Optimized succinct data structures for massive data

Abstract: SUMMARYSuccinct data structures provide the same functionality as their corresponding traditional data structure in compact space. We improve on functions rank and select, which are the basic building blocks of FM‐indexes and other succinct data structures. First, we present a cache‐optimal, uncompressed bitvector representation that outperforms all existing approaches. Next, we improve, in both space and time, on a recent result by Navarro and Providel on compressed bitvectors. Last, we show techniques to per… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
81
0

Year Published

2015
2015
2020
2020

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 99 publications
(81 citation statements)
references
References 25 publications
0
81
0
Order By: Relevance
“…The CSAs of sdsl can be parameterized with the described traditional sampling method, which uses a bitmap B to mark the sampled suffixes. It has recently been shown [11] that this sampling strategy, when B is represented as sd vector [27], gives better time/space tradeoffs than a strategy that does not use B but samples every SA[i] with i ≡ 0 mod s. In our first experiment, we compare the traditional sampling using SA s , SA s by just ·n/s samples plus a bitmap B of length n/s to mark those samples in SA s . We opted for = 1/8, so that every SA −1 value can be retrieved in at most 8 steps, and B is represented as an uncompressed bitmap (bit vector).…”
Section: Resultsmentioning
confidence: 99%
“…The CSAs of sdsl can be parameterized with the described traditional sampling method, which uses a bitmap B to mark the sampled suffixes. It has recently been shown [11] that this sampling strategy, when B is represented as sd vector [27], gives better time/space tradeoffs than a strategy that does not use B but samples every SA[i] with i ≡ 0 mod s. In our first experiment, we compare the traditional sampling using SA s , SA s by just ·n/s samples plus a bitmap B of length n/s to mark those samples in SA s . We opted for = 1/8, so that every SA −1 value can be retrieved in at most 8 steps, and B is represented as an uncompressed bitmap (bit vector).…”
Section: Resultsmentioning
confidence: 99%
“…This is fastest (and simultaneously smallest) FM-index we are aware of. The implementation is part of the SDSL library [9], which contains state-of-the-art implementations of compressed data structures.…”
Section: Handling Long Patternsmentioning
confidence: 99%
“…Additionally, there are different practical implementations of rank and select structures [GP14,Vig08,Cla96]. Especially the numbers of cache misses play an important role for the performance of these structures on modern machines.…”
Section: Parallel Rank and Select Structures On Bitvectorsmentioning
confidence: 99%