Classic similarity measures of strings are longest common subsequence and Levenshtein distance (i.e., the classic edit distance). A classic similarity measure of curves is dynamic time warping. These measures can be computed by simple O(n 2 ) dynamic programming algorithms, and despite much effort no algorithms with significantly better running time are known.We prove that, even restricted to binary strings or one-dimensional curves, respectively, these measures do not have strongly subquadratic time algorithms, i.e., no algorithms with running time O(n 2−ε ) for any ε > 0, unless the Strong Exponential Time Hypothesis fails. We generalize the result to edit distance for arbitrary fixed costs of the four operations (deletion in one of the two strings, matching, substitution), by identifying trivial cases that can be solved in constant time, and proving quadratic-time hardness on binary strings for all other cost choices. This improves and generalizes the known hardness result for Levenshtein distance [Backurs, Indyk STOC'15] by the restriction to binary strings and the generalization to arbitrary costs, and adds important problems to a recent line of research showing conditional lower bounds for a growing number of quadratic time problems.As our main technical contribution, we introduce a framework for proving quadratic-time hardness of similarity measures. To apply the framework it suffices to construct a single gadget, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability.Finally, we prove quadratic-time hardness for longest palindromic subsequence and longest tandem subsequence via reductions from longest common subsequence, showing that conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.
Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size n of data that originally has size N , and we want to solve a problem with time complexity T (·). The naïve strategy of "decompress-and-solve" gives time T (N ), whereas "the gold standard" is time T (n): to analyze the compression as efficiently as if the original data was small.We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar-Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design.We introduce a framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are: • The O(nN log N/n) bound for LCS and the O(min{N log N, nM }) bound for Pattern Matching with Wildcards are optimal up to N o(1) factors, under the Strong Exponential Time Hypothesis. (Here, M denotes the uncompressed length of the compressed pattern.)• Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the k-Clique conjecture.• We give an algorithm showing that decompress-and-solve is not optimal for Disjointness.
We empirically analyze two versions of the well-known "randomized rumor spreading" protocol to disseminate a piece of information in networks. In the classical model, in each round, each informed node informs a random neighbor. In the recently proposed quasirandom variant, each node has a (cyclic) list of its neighbors. Once informed, it starts at a random position of the list, but from then on informs its neighbors in the order of the list.While for sparse random graphs a better performance of the quasirandom model could be proven, all other results show that, independent of the structure of the lists, the same asymptotic performance guarantees hold as for the classical model.In this work, we compare the two models experimentally. Not only does this show that the quasirandom model generally is faster, but it also shows that the runtime is more concentrated around the mean. This is surprising given that much fewer random bits are used in the quasirandom process.These advantages are also observed in a lossy communication model, where each transmission does not reach its target with a certain probability, and in an asynchronous model, where nodes send at random times drawn from an exponential distribution. We also show that typically the particular structure of the lists has little influence on the efficiency.
We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings x and y of length n, a textbook algorithm solves LCS in time O(n 2 ), but although much effort has been spent, no O(n 2−ε )-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams FOCS'15; Bringmann, Künnemann FOCS'15].Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known.To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size n := max{|x|, |y|}, the length of the shorter string m := min{|x|, |y|}, the length L of an LCS of x and y, the numbers of deletions δ := m − L and ∆ := n − L, the alphabet size, as well as the numbers of matching pairs M and dominant pairs d. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms (up to lower order factors of the form n o(1) ). Specifically, we determine the optimal running time for LCS under SETH as (n + min{d, δ∆, δm}) 1±o(1) . Polynomial improvements over this running time must necessarily refute SETH or exploit novel input parameters. We establish the same lower bound for any constant alphabet of size at least 3. For binary alphabet, we show a SETH-based lower bound of (n + min{d, δ∆, δM/n}) 1−o(1) and, motivated by difficulties to improve this lower bound, we design an O(n+δM/n)-time algorithm, yielding again a matching bound.We feel that our systematic approach yields a comprehensive perspective on the well-studied multivariate complexity of LCS, and we hope to inspire similar studies of multivariate complexity landscapes for further polynomialtime problems.
We present a tight analysis of the basic randomized rumor spreading process in complete graphs introduced by Frieze and Grimmett (1985), where in each round of the process each node knowing the rumor gossips the rumor to a node chosen uniformly at random. The process starts with a single node knowing the rumor.We show that the number S n of rounds required to spread a rumor in a complete graph with n nodes is very closely described by log 2 n plus (1/n) times the completion time of the coupon collector process. This in particular gives very precise bounds for the expected runtime of the process, namely log 2 n + ln n − 1.116 ≤ E[S n ] ≤ log 2 n + ln n + 2.765 + o(1).
The discrete Fréchet distance is a popular measure for comparing sequences of points or polygonal curves. An important variant is the discrete Fréchet distance under translation, which is invariant under translations and thus enables detection of similar movement patterns in different spatial domains. For sequences of n points in the plane, the fastest known algorithm for the discrete Fréchet distance under translation runs in timeÕ(n 5 ) [Ben Avraham, Kaplan, Sharir ArXiv'15]. This is achieved by constructing a certain arrangement of disks of size O(n 4 ), and then traversing the faces of this arrangement while updating reachability in a directed grid graph of size N = O(n 2 ), which can be done in timeÕ(The contribution of this paper is two-fold:• Although it is a well-known open problem to solve dynamic reachability in directed grid graphs faster than in timeÕ( √ N ), we improve this part of the algorithm: We observe that an offline variant of dynamic s-t-reachability in directed grid graphs suffices, and we solve this offline variant in amortized timeÕ(N 1/3 ) per update. This results in an improved running time ofÕ(n 14/3 ) =Õ(n 4.66... ) for the discrete Fréchet distance under translation.• We provide evidence that constructing the arrangement of size O(n 4 ) is necessary in the worst case, by proving a conditional lower bound of n 4−o(1) on the running time for the discrete Fréchet distance under translation, assuming the Strong Exponential Time Hypothesis. This is surprising, since -to the best of our knowledge -exhaustively enumerating such a large arrangement is not known to be necessary for any other geometric problem.1 In this context one could even ask for a version of the Fréchet distance that is translation-and rotationinvariant, but we focus on the former in this paper.2 ByÕ(·) we hide polylogarithmic factors in n.
Zwick's (1+ε)-approximation algorithm for the All Pairs Shortest Path (APSP) problem runs in time O( n ω ε log W ), where ω ≤ 2.373 is the exponent of matrix multiplication and W denotes the largest weight. This can be used to approximate several graph characteristics including the diameter, radius, median, minimum-weight triangle, and minimum-weight cycle in the same time bound.Since Zwick's algorithm uses the scaling technique, it has a factor log W in the running time. In this paper, we study whether APSP and related problems admit approximation schemes avoiding the scaling technique. That is, the number of arithmetic operations should be independent of W ; this is called strongly polynomial. Our main results are as follows. * Claim 5.6. We have d T r,b [2k], T r,b [2k + 1] > 1 ε for any level r, index k, and b ∈ {1, 2}. Proof. Because of how T r,1 , T r,2 are constructed, the chunks T r,b [2k] and T r,b [2k + 1] correspond to chunks T r [2k ′ ] and T r [2k ′ + 3] for some k ′ . The statement now follows from Claim 5.4. The following analogue of Claim 5.5 is immediate. Claim 5.7. For any x, y ∈ Z, if d(x, y) > 1 ε and x < y, then there exist a level r, index k, and b ∈ {1, 2} such that x ∈ T r,b [2k − 1] and y ∈ T r,b [2k].Proof. Consecutive chunks T r [2k − 1] and T r [2k] are either both added to T r,1 or both added to T r,2 . The statement thus follows from Claim 5.5.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.