Shirou Maruyama scite author profile

We present novel variants of fully online LCA (FOLCA), a fully online grammar compression that builds a straight line program (SLP) and directly encodes it into a succinct representation in an online manner. FOLCA enables a direct encoding of an SLP into a succinct representation that is asymptotically equivalent to an information theoretic lower bound for representing an SLP (Maruyama et al., SPIRE'13). The compression of FOLCA takes linear time proportional to the length of an input text and its working space depends only on the size of the SLP, which enables us to apply FOLCA to large-scale repetitive texts. Recent repetitive texts, however, include some noise. For example, current sequencing technology has significant error rates, which embeds noise into genome sequences. For such noisy repetitive texts, FOLCA working in the SLP size consumes a large amount of memory. We present two variants of FOLCA working in constant space by leveraging the idea behind stream mining techniques. Experiments using 100 human genomes corresponding to about 300GB from the 1000 human genomes project revealed the applicability of our method to large-scale, noisy repetitive texts.

show abstract

An Online Algorithm for Lightweight Grammar-Based Compression

Maruyama

Sakamoto

Takeda

2012

Algorithms

View full text Add to dashboard Cite

Grammar-based compression is a well-studied technique to construct a context-free grammar (CFG) deriving a given text uniquely. In this work, we propose an online algorithm for grammar-based compression. Our algorithm guarantees O(log 2 n)-approximation ratio for the minimum grammar size, where n is an input size, and it runs in input linear time and output linear space. In addition, we propose a practical encoding, which transforms a restricted CFG into a more compact representation. Experimental results by comparison with standard compressors demonstrate that our algorithm is especially effective for highly repetitive text.

show abstract

A Space-Saving Approximation Algorithm for Grammar-Based Compression

Sakamoto

Maruyama

Kida³

et al. 2009

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYA space-efficient approximation algorithm for the grammar-based compression problem, which requests for a given string to find a smallest context-free grammar deriving the string, is presented. For the input length n and an optimum CFG size g, the algorithm consumes only O(g log g) space and O(n log * n) time to achieve O((log * n) log n) approximation ratio to the optimum compression, where log * n is the maximum number of logarithms satisfying log log · · · log n > 1. This ratio is thus regarded to almost O(log n), which is the currently best approximation ratio. While g depends on the string, it is known that g = Ω(log n) and g = O n log k n for strings from k-letter alphabet [12].

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shirou Maruyama

ESP-index: A compressed index based on edit-sensitive parsing

Fully-Online Grammar Compression

Fully Online Grammar Compression in Constant Space

An Online Algorithm for Lightweight Grammar-Based Compression

A Space-Saving Approximation Algorithm for Grammar-Based Compression

Contact Info

Product

Resources

About