Two important similarity measures between sequences are the longest common subsequence (LCS) and the dynamic time warping distance (DTWD). The computations of these measures for two given sequences are central tasks in a variety of applications. Simple dynamic programming algorithms solve these tasks in O(n 2 ) time, and despite an extensive amount of research, no algorithms with significantly better worst case upper bounds are known.In this paper, we show that an O(n 2−ε ) time algorithm, for some ε > 0, for computing the LCS or the DTWD of two sequences of length n over a constant size alphabet, refutes the popular Strong Exponential Time Hypothesis (SETH). Moreover, we show that computing the LCS of k strings over an alphabet of size O(k) cannot be done in O(n k−ε ) time, for any ε > 0, under SETH. Finally, we also address the time complexity of approximating the DTWD of two strings in truly subquadratic time.
The edit distance (a.k.a. the Levenshtein distance) between two strings is defined as the minimum number of insertions, deletions or substitutions of symbols needed to transform one string into another. The problem of computing the edit distance between two strings is a classical computational task, with a well-known algorithm based on dynamic programming. Unfortunately, all known algorithms for this problem run in nearly quadratic time.In this paper we provide evidence that the near-quadratic running time bounds known for the problem of computing edit distance might be tight. Specifically, we show that, if the edit distance can be computed in time O(n 2−δ ) for some constant δ > 0, then the satisfiability of conjunctive normal form formulas with N variables and M clauses can be solved in time M O(1) 2 (1− )N for a constant > 0. The latter result would violate the Strong Exponential Time Hypothesis, which postulates that such algorithms do not exist.
The CFG recognition problem is: given a context-free grammar G and a string w of length n, decide if w can be obtained from G. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in O(n ω ) time, where ω < 2.373 is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic O(n 3 / log 3 n) complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time O(|G| · n 3−ε ) can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be |G| = Ω(n 6 ). Nothing was known for the more relevant case of constant size grammars.In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the k-Clique problem: given a graph on n nodes, decide if there are k that form a clique.Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14).
Regular expressions constitute a fundamental notion in formal language theory and are frequently used in computer science to define search patterns. In particular, regular expression matching and membership testing are widely used computational primitives, employed in many programming languages and text processing utilities. A classic algorithm for these problems constructs and simulates a non-deterministic finite automaton corresponding to the expression, resulting in an O(mn) running time (where m is the length of the pattern and n is the length of the text). This running time can be improved slightly (by a polylogarithmic factor), but no significantly faster solutions are known. At the same time, much faster algorithms exist for various special cases of regular expressions, including dictionary matching, wildcard matching, subset matching, word break problem etc.In this paper, we show that the complexity of regular expression matching can be characterized based on its depth (when interpreted as a formula). Our results hold for expressions involving concatenation, OR, Kleene star and Kleene plus. For regular expressions of depth two (involving any combination of the above operators), we show the following dichotomy: matching and membership testing can be solved in near-linear time, except for "concatenations of stars", which cannot be solved in strongly sub-quadratic time assuming the Strong Exponential Time Hypothesis (SETH). For regular expressions of depth three the picture is more complex. Nevertheless, we show that all problems can either be solved in strongly sub-quadratic time, or cannot be solved in strongly sub-quadratic time assuming SETH.An intriguing special case of membership testing involves regular expressions of the form "a star of an OR of concatenations", e.g., [a|ab|bc] * . This corresponds to the so-called word break problem, for which a dynamic programming algorithm with a runtime of (roughly) O(n √ m) is known. We show that the latter bound is not tight and improve the runtime to O(nm 0.44... ). * backurs@mit.edu † indyk@mit.edu 2. Pattern matching problems with depth-2 expressions contain a "high density" of interesting algorithmic problems, with non-trivial algorithms existing for types "·+" (this paper), "·|" [CH02] , "|·" [AC75] and "+·" (essentially solved in [KMP77], since + can be dropped).In contrast, membership problems with depth-2 expressions have a very restrictive structure that makes them mostly trivially solvable in linear time, with the aforementioned exception for the "· * " type 3. Pattern matching problems with depth-3 expressions have a more diversified structure. All types starting with · are SETH-hard; all types starting with | are either-SETH hard (if followed by ·) or easily solvable in linear time; all types starting from * are trivially solvable in linear time (since * allows zero repetitions); all types starting from + inherit their complexity from the last two operators in the type description (since + allows exactly one repetition).4. Finally, membership checki...
The Subtree Isomorphism problem asks whether a given tree is contained in another given tree. The problem is of fundamental importance and has been studied since the 1960s. For some variants, e.g., ordered trees , near-linear time algorithms are known, but for the general case truly subquadratic algorithms remain elusive. Our first result is a reduction from the Orthogonal Vectors problem to Subtree Isomorphism, showing that a truly subquadratic algorithm for the latter refutes the Strong Exponential Time Hypothesis (SETH). In light of this conditional lower bound, we focus on natural special cases for which no truly subquadratic algorithms are known. We classify these cases against the quadratic barrier, showing in particular that: • Even for binary, rooted trees, a truly subquadratic algorithm refutes SETH. • Even for rooted trees of depth O (log log n ), where n is the total number of vertices, a truly subquadratic algorithm refutes SETH. • For every constant d , there is a constant ε d > 0 and a randomized, truly subquadratic algorithm for degree- d rooted trees of depth at most (1+ ε d ) log d n . In particular, there is an O (min { 2.85 h , n 2 }) algorithm for binary trees of depth h . Our reductions utilize new “tree gadgets” that are likely useful for future SETH-based lower bounds for problems on trees. Our upper bounds apply a folklore result from randomized decision tree complexity.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.