2014 Data Compression Conference 2014
DOI: 10.1109/dcc.2014.78
|View full text |Cite
|
Sign up to set email alerts
|

Lempel-Ziv Parsing in External Memory

Abstract: For decades, computing the LZ factorization (or LZ77 parsing) of a string has been a requisite and computationally intensive step in many diverse applications, including text indexing and data compression. Many algorithms for LZ77 parsing have been discovered over the years; however, despite the increasing need to apply LZ77 to massive data sets, no algorithm to date scales to inputs that exceed the size of internal memory. In this paper we describe the first algorithm for computing the LZ77 parsing in externa… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
17
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
7
2

Relationship

4
5

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 27 publications
(33 reference statements)
0
17
0
Order By: Relevance
“…Goto and Bannai [2013] utilized SACA-K to design a space-efficient linear-time algorithm for computing LZ77 factorization on constant alphabets. Kärkkäinen et al [2014] proposed algorithms for computing the LZ77 parsing efficiently using the external memory. This suggests a possibility for extending DSA-IS for computing LZ77 factorization in the external memory.…”
Section: Discussionmentioning
confidence: 99%
“…Goto and Bannai [2013] utilized SACA-K to design a space-efficient linear-time algorithm for computing LZ77 factorization on constant alphabets. Kärkkäinen et al [2014] proposed algorithms for computing the LZ77 parsing efficiently using the external memory. This suggests a possibility for extending DSA-IS for computing LZ77 factorization in the external memory.…”
Section: Discussionmentioning
confidence: 99%
“…Recently, in an attempt to address this standoff, Ferrada et al [3] described hybrid indexing -an algorithmic technique by which any conventional pattern matching index (including any read aligner) can be made to scale to large, highly compressible collections via means of the Lempel-Ziv (LZ77) parsing [30,14,11], a method from data compression (we give a formal definition shortly). In particular, given an upper bound M on the searchable pattern length, the first step of hybrid indexing is to obtain a filtered string consisting of the concatenation of the M -length substrings to the left and right of each LZ77 phrase boundary.…”
Section: Introductionmentioning
confidence: 99%
“…It is also used for genome assembly [3,2,1] and extensively for the discovery of repetitive structures in genomic data [22]. Elsewhere, in data compression, it is the index underlying state-of-the-art methods for Lempel-Ziv factorization [20,18]. The key virtue of the index, in these and other applications, is that it stores a string T in a compressed form that also supports fast pattern matching queries.…”
Section: Introductionmentioning
confidence: 99%