A Trie-Based Approach for Compacting Automata

Crochemore, Maxime; Epifanio, Chiara; Grossi, Roberto; Mignosi, Filippo

doi:10.1007/978-3-540-27801-6_11

Cited by 11 publications

(9 citation statements)

References 12 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our techniques can also be extended to tries and automata, as discussed in previous work [6]. The results in this paper are an extension of some ideas originally described in [6,8].…”

Section: Conclusion Open Problems and Further Workmentioning

confidence: 69%

Linear-size suffix tries

Crochemore

Epifanio

Grossi

et al. 2016

Theoretical Computer Science

Self Cite

View full text Add to dashboard Cite

Please cite this article in press as: M. Crochemore et al., Linear-size suffix tries, Theoret. Comput. Sci. (2016), http://dx.doi.org/10.1016/j.tcs. 2016.04.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Linear-Size Suffix TriesSuffix trees are highly regarded data structures for text indexing and string algorithms [MCreight 76, Weiner 73]. For any given string w of length n = |w|, a suffix tree for w takes O(n) nodes and links. It is often presented as a compacted version of a suffix trie for w, where the latter is the trie (or digital search tree) built on the suffixes of w. Here the compaction process replaces each maximal chain of unary nodes with a single arc. For this, the suffix tree requires that the labels of its arcs are substrings encoded as pointers to w (or equivalent information). On the contrary, the arcs of the suffix trie are labeled by single symbols but there can be Θ(n 2 ) nodes and links for suffix tries in the worst case because of their unary nodes. It is an interesting question if the suffix trie can be stored using O(n) nodes. We present the linear-size suffix trie, which guarantees O(n) nodes. We use a new technique for reducing the number of unary nodes to O(n), that stems from some results on antidictionaries. For instance, by using the linear-size suffix trie, we are able to check whether a pattern p of length m = |p| occurs in w in O(m log |Σ|) time and we can find the longest common substring of two strings w 1 and w 2 in O((|w 1 | + |w 2 |) log |Σ|) time for an alphabet Σ.

show abstract

“…Our techniques can also be extended to tries and automata, as discussed in previous work [6]. The results in this paper are an extension of some ideas originally described in [6,8].…”

Section: Conclusion Open Problems and Further Workmentioning

confidence: 69%

Linear-size suffix tries

Crochemore

Epifanio

Grossi

et al. 2016

Theoretical Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, these algorithms can deal with only binary ECG data since compression ratios get worse as alphabet size increases since the size of an antidictionary is proportional to alphabet size [15]. To deal with ECG data over a finite alphabet, we can apply the ACDCA to ECG data, however, it is difficult to handle an extremely long data such as ECG since the ACDCA requires computational memory in proportional to the data size.…”

Section: Copyright C 2010 the Institute Of Electronics Information Amentioning

confidence: 99%

On-Line Electrocardiogram Lossless Compression Using Antidictionary Codes for a Finite Alphabet

Ota

Morita

2010

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARYAn antidictionary is particularly useful for data compression, and on-line electrocardiogram (ECG) lossless compression algorithms using antidictionaries have been proposed. They work in real-time with constant memory and give better compression ratios than traditional lossless data compression algorithms, while they only deal with ECG data on a binary alphabet. This paper proposes on-line ECG lossless compression for a given data on a finite alphabet. The proposed algorithm gives not only better compression ratios than those algorithms but also uses less computational space than they do. Moreover, the proposed algorithm work in real-time. Its effectiveness is demonstrated by simulation results.

show abstract

“…It is proved that their method, called Data Compression using Antidictionaries (DCA), achieves a compression ratio for a balanced binary source that is equal to its entropy rate. Crochemore et al also proposed an extension of the DCA to any string over any finite alphabet [2]. In 2005, Ohkawa et at.…”

Section: Introductionmentioning

confidence: 96%

Length of minimal forbidden words on a stationary ergodic source

Ota

Morita

2009

2009 IEEE International Symposium on Information Theory

View full text Add to dashboard Cite

An antidictionary is in particular useful for data compression, and it consists of minimal forbidden words for a given string. We derive the average length M n of minimal forbidden words in strings of length n under a stationary ergodic source with entropy H which takes values on a finite alphabet. For the string length n, we prove, log ti]M« == H, in probability, as n -+ 00. We use the Wyner-Ziv result, with respect to connection between entropy and recurrence-time for ergodic processes, to prove the theorem. Its validity is shown by simulation results on a memoryless binary information source.

show abstract

A Trie-Based Approach for Compacting Automata

Cited by 11 publications

References 12 publications

Linear-size suffix tries

Linear-size suffix tries

On-Line Electrocardiogram Lossless Compression Using Antidictionary Codes for a Finite Alphabet

Length of minimal forbidden words on a stationary ergodic source

Contact Info

Product

Resources

About