Classic similarity measures of strings are longest common subsequence and Levenshtein distance (i.e., the classic edit distance). A classic similarity measure of curves is dynamic time warping. These measures can be computed by simple O(n 2 ) dynamic programming algorithms, and despite much effort no algorithms with significantly better running time are known.We prove that, even restricted to binary strings or one-dimensional curves, respectively, these measures do not have strongly subquadratic time algorithms, i.e., no algorithms with running time O(n 2−ε ) for any ε > 0, unless the Strong Exponential Time Hypothesis fails. We generalize the result to edit distance for arbitrary fixed costs of the four operations (deletion in one of the two strings, matching, substitution), by identifying trivial cases that can be solved in constant time, and proving quadratic-time hardness on binary strings for all other cost choices. This improves and generalizes the known hardness result for Levenshtein distance [Backurs, Indyk STOC'15] by the restriction to binary strings and the generalization to arbitrary costs, and adds important problems to a recent line of research showing conditional lower bounds for a growing number of quadratic time problems.As our main technical contribution, we introduce a framework for proving quadratic-time hardness of similarity measures. To apply the framework it suffices to construct a single gadget, which encapsulates all the expressive power necessary to emulate a reduction from satisfiability.Finally, we prove quadratic-time hardness for longest palindromic subsequence and longest tandem subsequence via reductions from longest common subsequence, showing that conditional lower bounds based on the Strong Exponential Time Hypothesis also apply to string problems that are not necessarily similarity measures.
The Fréchet distance is a well-studied and very popular measure of similarity of two curves. Many variants and extensions have been studied since Alt and Godau introduced this measure to computational geometry in 1991. Their original algorithm to compute the Fréchet distance of two polygonal curves with n vertices has a runtime of O(n 2 log n). More than 20 years later, the state of the art algorithms for most variants still take time more than O(n 2 / log n), but no matching lower bounds are known, not even under reasonable complexity theoretic assumptions.To obtain a conditional lower bound, in this paper we assume the Strong Exponential Time Hypothesis or, more precisely, that there is no O * ((2 − δ) N ) algorithm for CNF-SAT for any δ > 0. Under this assumption we show that the Fréchet distance cannot be computed in strongly subquadratic time, i.e., in time O(n 2−δ ) for any δ > 0. This means that finding faster algorithms for the Fréchet distance is as hard as finding faster CNF-SAT algorithms, and the existence of a strongly subquadratic algorithm can be considered unlikely.Our result holds for both the continuous and the discrete Fréchet distance. We extend the main result in various directions. Based on the same assumption we (1) show non-existence of a strongly subquadratic 1.001-approximation, (2) present tight lower bounds in case the numbers of vertices of the two curves are imbalanced, and (3) examine realistic input assumptions (cpacked curves). Bringmann is a recipient of the Google Europe Fellowship in Randomized Algorithms, and this research is supported in part by this Google Fellowship.Intuitively, the (continuous) Fréchet distance of two curves P, Q is the minimal length of a leash required to connect a dog to its owner, as they walk along P or Q, respectively, without backtracking. The Fréchet distance is a very popular measure of similarity of two given curves. In contrast to distance notions such as the Hausdorff distance, it takes into account the order of the points along the curve, and thus better captures the similarity as perceived by human observers [3].Alt and Godau introduced the Fréchet distance to computational geometry in 1991 [5,24]. For polygonal curves P and Q with n and m vertices 1 , respectively, they presented an O(nm log(nm)) algorithm. Since Alt and Godau's seminal paper, Fréchet distance has become a rich field of research, with various directions such as generalizations to surfaces (see, e.g., [4]), approximation algorithms for realistic input curves ([6, 7, 21]), the geodesic and homotopic Fréchet distance (see, e.g., [15,17]), and many more variants (see, e.g., [11,20,29,31]). Being a natural measure for curve similarity, the Fréchet distance has found applications in various areas such as signature verification (see, e.g., [32]), map-matching tracking data (see, e.g., [9]), and moving objects analysis (see, e.g., [12]).A particular variant that we will also discuss in this paper is the discrete Fréchet distance. Here, intuitively the dog and its owner are replaced ...
Real-world networks, like social networks or the internet infrastructure, have structural properties such as large clustering coefficients that can best be described in terms of an underlying geometry. This is why the focus of the literature on theoretical models for realworld networks shifted from classic models without geometry, such as Chung-Lu random graphs, to modern geometry-based models, such as hyperbolic random graphs.With this paper we contribute to the theoretical analysis of these modern, more realistic random graph models. Instead of studying directly hyperbolic random graphs, we use a generalization that we call geometric inhomogeneous random graphs (GIRGs). Since we ignore constant factors in the edge probabilities, GIRGs are technically simpler (specifically, we avoid hyperbolic cosines), while preserving the qualitative behaviour of hyperbolic random graphs, and we suggest to replace hyperbolic random graphs by this new model in future theoretical studies.We prove the following fundamental structural and algorithmic results on GIRGs.(1) As our main contribution we provide a sampling algorithm that generates a random graph from our model in expected linear time, improving the best-known sampling algorithm for hyperbolic random graphs by a substantial factor O( √ n).(2) We establish that GIRGs have clustering coefficients in Ω(1), (3) we prove that GIRGs have small separators, i.e., it suffices to delete a sublinear number of edges to break the giant component into two large pieces, and (4) we show how to compress GIRGs using an expected linear number of bits. * We choose a toroidal ground space for the technical simplicity that comes with its symmetry and in order to obtain hyperbolic random graphs as a special case. The results of this paper stay true if T d is replaced, say, by the d-dimensional unitcube [0, 1] d . † A major difference between hyperbolic random graphs and our generalisation is that we ignore constant factors in the edge probabilities puv. This allows to greatly simplify the edge probability expressions, thus reducing the technical overhead. W [20,21]. Note that the term min{1, .} is necessary, as the product w u w v may be larger than W. Classically, the Θ simply hides a factor 1, but by ‡ We say that an event holds with high probability (whp) if it holds with probability 1 − n −ω(1) .
Given a set Z of n positive integers and a target value t, the SubsetSum problem asks whether any subset of Z sums to t. A textbook pseudopolynomial time algorithm by Bellman from 1957 solves SubsetSum in time O(n t). This has been improved to O(n max Z) by Pisinger [J. Algorithms'99] and recently toÕ( √ n t) by Koiliaris and Xu [SODA'17]. Here we present a simple randomized algorithm running in timeÕ(n + t). This improves upon a classic algorithm and is likely to be near-optimal, since it matches conditional lower bounds from SetCover and k-Clique.We then use our new algorithm and additional tricks to improve the best known polynomial space solution from timeÕ(n 3 t) and spaceÕ(n 2 ) to timeÕ(n t) and spaceÕ(n log t), assuming the Extended Riemann Hypothesis. Unconditionally, we obtain timeÕ(n t 1+ε ) and spaceÕ(n t ε ) for any constant ε > 0. *
The Strong Exponential Time Hypothesis and the OV-conjecture are two popular hardness assumptions used to prove a plethora of lower bounds, especially in the realm of polynomialtime algorithms. The OV-conjecture in moderate dimension states there is no ε > 0 for which an O(N 2−ε ) poly(D) time algorithm can decide whether there is a pair of orthogonal vectors in a given set of size N that contains D-dimensional binary vectors.We strengthen the evidence for these hardness assumptions. In particular, we show that if the OV-conjecture fails, then two problems for which we are far from obtaining even tiny improvements over exhaustive search would have surprisingly fast algorithms. If the OV conjecture is false, then there is a fixed ε > 0 such that: ACM Subject Classification Theory of computation → Problems, reductions and completenessMore Consequences of Falsifying SETH and the Orthogonal Vectors Conjecture decide the satisfiability of bounded-width CNF formulas. SETH is used in the study of exact and fixed parameter tractable algorithms, see e.g [23,46] or the book by Cygan et al. [24]. In this area, it implies, among other things, tight lower bounds for problems on graphs that have small treewidth or pathwidth [41,26,25].Closely related to SETH, the orthogonal vectors problem (OV) is, given two sets A and B of N vectors from {0, 1} D , to decide whether there are vectors a ∈ A and b ∈ B such that a and b are orthogonal in Z D . If D ≤ O(N 0.3 ) holds, the problem can be solved in timeÕ(N 2 ) using an algorithm based on fast rectangular matrix multiplication (see e.g. [31]). SETH implies [54] that this algorithm is essentially as fast as possible; in particular, SETH implies the following hardness conjecture, which was given its name by Gao et al. [32].Conjecture 1.1 (Moderate-dimension OV Conjecture). There are no reals ε, δ > 0 such that OV for D = N δ can be solved 1 in time O(N 2−ε ).The moderate-dimension OV conjecture is used to study the fine-grained complexity of problems in P, for which it has remarkably strong and diverse implications. If the conjecture is true, then dozens of important problems from all across computer science exhibit running time lower bounds that match existing upper bounds up to subpolynomial factors. These include pattern matching and other problems in bioinformatics [7, 10, 40, 1], graph algorithms [47,6,32], computational geometry [16], formal languages [11,18], time-series analysis [2,19], and even economics [42] (see [58] for a more comprehensive list).Gao et al.[32] also named the low-dimension OV conjecture, which asserts that OV does not have subquadratic algorithms whenever D = ω(log N ) holds. The low-dimension implies the moderate-dimension variant of the OV conjecture, and both are implied by SETH [54]. Recent results on the hardness of approximation problems, such as Maximum Inner Product [5], rely on the stronger conjecture (perhaps also [12,14]). However, for the vast majority of OV-based hardness results, reducing the dimension only affects lower-order terms in the lo...
We consider the computation of the volume of the union of high-dimensional geometric objects. While showing that this problem is #P-hard already for very simple bodies, we give a fast FPRAS for all objects where one can (1) test whether a given point lies inside the object, (2) sample a point uniformly, and (3) calculate the volume of the object in polynomial time. It suffices to be able to answer all three questions approximately. We show that this holds for a large class of objects. It implies that Klee's measure problem can be approximated efficiently even though it is #P-hard and hence cannot be solved exactly in time polynomial in the number of dimensions unless P = NP. Our algorithm also allows to efficiently approximate the volume of the union of convex bodies given by weak membership oracles.For the analogous problem of the intersection of high-dimensional geometric objects we prove #P-hardness for boxes and show that there is no multiplicative polynomial-time 2 d 1−ε -approximation for certain boxes unless NP = BPP, but give a simple additive polynomial-time ε-approximation.
A number of recent works have studied algorithms for entrywise p-low rank approximation, namely algorithms which given an n×d matrix A (with n ≥ d), output a rank-k matrix B minimizing A − B p p = i,j |Ai,j − Bi,j| p when p > 0; and A − B 0 = i,j [Ai,j = Bi,j] for p = 0, where [·] is the Iverson bracket, that is, A − B 0 denotes the number of entries (i, j) for which Ai,j = Bi,j. For p = 1, this is often considered more robust than the SVD, while for p = 0 this corresponds to minimizing the number of disagreements, or robust PCA. This problem is known to be NP-hard for p ∈ {0, 1}, already for k = 1, and while there are polynomial time approximation algorithms, their approximation factor is at best poly(k). It was left open if there was a polynomial-time approximation scheme (PTAS) for p-approximation for any p ≥ 0. We show the following:1. On the algorithmic side, for p ∈ (0, 2), we give the first n poly(k/ε) time (1 + ε)approximation algorithm. For p = 0, there are various problem formulations, a common one being the binary setting in which A ∈ {0, 1} n×d and B = U · V , where U ∈ {0, 1} n×k and V ∈ {0, 1} k×d . There are also various notions of multiplication U · V , such as a matrix product over the reals, over a finite field, or over a Boolean semiring. We give the first almost-linear time approximation scheme for what we call the Generalized Binary 0-Rank-k problem, for which these variants are special cases. Our algorithm computes (1 + ε)-approximation in time (1/ε) 2 O(k) /ε 2 · nd 1+o(1) , where o(1) hides a factor (log log d) 1.1 / log d. In addition, for the case of finite fields of constant size, we obtain an alternate PTAS running in time n · d poly(k/ε) . Definition 2. (Generalized Binary 0 -Rank-k) Given a matrix A ∈ {0, 1} n×d with n ≥ d, an integer k, and an inner product function ., . :Our first result for p = 0 is as follows.Theorem 2 (PTAS for p = 0). For any ε ∈ (0, 1 2 ), there is a (1+ε)-approximation algorithm for the Generalized Binary 0 -Rank-k problem running in time (1/ε) 2 O(k) /ε 2 · nd 1+o(1) and succeeds with constant probability 1 , where o(1) hides a factor (log log d)Hence, we obtain the first almost-linear time approximation scheme for the Generalized Binary 0 -Rank-k problem, for any constant k. In particular, this yields the first polynomial time (1+ε)-approximation for constant k for 0 -low rank approximation of binary matrices when the underlying field is F 2 or the Boolean semiring. Even for k = 1, no PTAS was known before.Theorem 2 is doubly-exponential in k, and we show below that this is necessary for any approximation algorithm for Generalized Binary 0 -Rank-k. However, in the special case when the base field is F 2 , or more generally F q and A, U, and V have entries belonging to F q , it is possible to obtain an algorithm running in time n·d poly(k/ε) , which is an improvement for certain super-constant values of k and ε. We formally define the problem and state our result next. Definition 3. (Entrywise 0 -Rank-k Approximation over F q ) Given an n × d matrix A with e...
Can we analyze data without decompressing it? As our data keeps growing, understanding the time complexity of problems on compressed inputs, rather than in convenient uncompressed forms, becomes more and more relevant. Suppose we are given a compression of size n of data that originally has size N , and we want to solve a problem with time complexity T (·). The naïve strategy of "decompress-and-solve" gives time T (N ), whereas "the gold standard" is time T (n): to analyze the compression as efficiently as if the original data was small.We restrict our attention to data in the form of a string (text, files, genomes, etc.) and study the most ubiquitous tasks. While the challenge might seem to depend heavily on the specific compression scheme, most methods of practical relevance (Lempel-Ziv-family, dictionary methods, and others) can be unified under the elegant notion of Grammar-Compressions. A vast literature, across many disciplines, established this as an influential notion for Algorithm design.We introduce a framework for proving (conditional) lower bounds in this field, allowing us to assess whether decompress-and-solve can be improved, and by how much. Our main results are: • The O(nN log N/n) bound for LCS and the O(min{N log N, nM }) bound for Pattern Matching with Wildcards are optimal up to N o(1) factors, under the Strong Exponential Time Hypothesis. (Here, M denotes the uncompressed length of the compressed pattern.)• Decompress-and-solve is essentially optimal for Context-Free Grammar Parsing and RNA Folding, under the k-Clique conjecture.• We give an algorithm showing that decompress-and-solve is not optimal for Disjointness.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.