Experimental Algorithms
DOI: 10.1007/978-3-540-72845-0_16
|View full text |Cite
|
Sign up to set email alerts
|

Simple Compression Code Supporting Random Access and Fast String Matching

Abstract: Abstract. Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol of the original sequence; (iii) allows efficient pattern matching over the compressed sequence. Our simplest solution uses at most 2h + o(h) bits of space, where h = n(H 0 (S) + 1), and H 0 (S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. The new metho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
24
0

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 19 publications
(25 citation statements)
references
References 22 publications
(24 reference statements)
0
24
0
Order By: Relevance
“…The compression scheme from [11] will thus give an expected space bound of n(log e + O((log(λ) + 1)/λ)) + O(λ 2 ) (the O-notation refers to growing λ), as shown in Appendix A.2. More sophisticated compression schemes for the index table T, which can reduce the effect of the fact that the code length must be an integer while log(σ(i)) is not, will be able to better approximate the optimum n log e.…”
Section: Overall Analysismentioning
confidence: 99%
See 4 more Smart Citations
“…The compression scheme from [11] will thus give an expected space bound of n(log e + O((log(λ) + 1)/λ)) + O(λ 2 ) (the O-notation refers to growing λ), as shown in Appendix A.2. More sophisticated compression schemes for the index table T, which can reduce the effect of the fact that the code length must be an integer while log(σ(i)) is not, will be able to better approximate the optimum n log e.…”
Section: Overall Analysismentioning
confidence: 99%
“…The evaluation time of the PHFs and MPHFs generated by CHD algorithm depends on the compression technique used. For instance, it is possible to generate faster functions using Elias-Fano scheme (see [27]) instead of the one we used for the experiments [11] at the expense of generating functions with a slightly larger description size (we obtained PHFs that require 2.08 bits per key instead of 1.98 bits per key for α = 0.99 and λ = 5).…”
Section: Comparing the Chd And Bpz Algorithmsmentioning
confidence: 99%
See 3 more Smart Citations