Martin Farach scite author profile

Original citation: Agarwala, R., Bafna, V., Farach, M., Paterson, Michael S. and Thorup, M. (1997) On the approximability of numerical taxonomy (fitting distances by tree metrics Copies of full items can be used for personal research or study, educational, or not-forprofit purposes without prior permission or charge. Provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way. A note on versions:The version presented in WRAP is the published version or, version of record, and may be cited as it appears here.For more information, please contact the WRAP

show abstract

Let Sleeping Files Lie: Pattern Matching in Z-Compressed Files

Amir

Benson

Farach

1996

Journal of Computer and System Sciences

139

142

View full text Add to dashboard Cite

The current explosion of stored information necessitates a new model of pattern matching, that of compressed matching. In this model one tries to find all occurrences of a pattern in a compressed text in time proportional to the compressed text size, i.e., without decompressing the text. The most effective general purpose compression algorithms are adaptive, in that the text represented by each compression symbol is determined dynamically by the data. As a result, the encoding of a substring depends on its location. Thus the same substring may``look different'' every time it appears in the compressed text. In this paper we consider pattern matching without decompression in the UNIX Z-compression. This is a variant of the Lempel Ziv adaptive compression scheme. If n is the length of the compressed text and m is the length of the pattern, our algorithms find the first pattern occurrence in time O(n+m 2 ) or O(n log m+m). We also introduce a new criterion to measure compressed matching algorithms, that of extra space. We show how to modify our algorithms to achieve a trade-off between the amount of extra space used and the algorithm's time complexity.

show abstract

Alphabet dependence in parameterized matching

Amir

Farach

Muthukrishnan

1994

Information Processing Letters

100

View full text Add to dashboard Cite

String Matching in Lempel—Ziv Compressed Strings

Farach¹,

Thorup²

1998

Algorithmica

View full text Add to dashboard Cite

String matching and compression are two widely studied areas of computer science. The theory of string matching has a long association with compression algorithms. Data structures from string matching can be used to derive fast implementations of many important compression schemes, most notably the Lempel-Ziv (LZ77) algorithm. Intuitively, once a string has been compressed-and therefore its repetitive nature has been elucidated-one might be tempted to exploit this knowledge to speed up string matching. The Compressed Matching Problem is that of performing string matching in a compressed text, without uncompressing it. More formally, let T be a text, let Z be the compressed string representing T , and let P be a pattern. The Compressed Matching Problem is that of deciding if P occurs in T , given only P and Z. Compressed matching algorithms have been given for several compression schemes such as LZW.In this paper we give the first nontrivial compressed matching algorithm for the classic adaptive compression scheme, the LZ77 algorithm. In practice, the LZ77 algorithm is known to compress more than other dictionary compression schemes, such as LZ78 and LZW, though for strings with constant per bit entropy, all these schemes compress optimally in the limit. However, for strings with o(1) per bit entropy, while it was recently shown that the LZ77 gives compression to within a constant factor of optimal, schemes such as LZ78 and LZW may deviate from optimality by an exponential factor. Asymptotically, compressed matching is only relevant if |Z| = o(|T |), i.e., if the compression ratio |T |/|Z| is more than a constant. These results show that LZ77 is the appropriate compression method in such settings.We present an LZ77 compressed matching algorithm which runs in time O(N log 2 U/N+P) where N = |Z|, U = |T |, and P = |P|. Compare with the naïve "decompresion" algorithm, which takes time (U + P) to decide if P occurs in T . Writing U + P as N · U/N + P, we see that we have improved the complexity, replacing the compression factor U/N by a factor log 2 U/N. Our algorithm is competitive in the sense that O(N log 2 U/N + P) = O(U + P), and opportunistic in the sense that O(N log 2 U/N + P) = o(U + P) if N = o(U) and P = o(U).

show abstract

On the agreement of many trees

Farach

Przytycka²,

Thorup

1995

Information Processing Letters

101

View full text Add to dashboard Cite

A robust model for finding optimal evolutionary trees

1995

View full text Add to dashboard Cite

An Alphabet Independent Approach to Two-Dimensional Pattern Matching

Amir

Benson²,

Farach

1994

SIAM J. Comput.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Martin Farach

Optimal suffix tree construction with large alphabets

On the Approximability of Numerical Taxonomy (Fitting Distances by Tree Metrics)

Let Sleeping Files Lie: Pattern Matching in Z-Compressed Files

Alphabet dependence in parameterized matching

String Matching in Lempel—Ziv Compressed Strings

On the agreement of many trees

A robust model for finding optimal evolutionary trees

An Alphabet Independent Approach to Two-Dimensional Pattern Matching

Contact Info

Product

Resources

About