Diptarka Chakraborty scite author profile

The edit distance is a way of quantifying how similar two strings are to one another by counting the minimum number of character insertions, deletions, and substitutions required to transform one string into the other.In this paper we study the computational problem of computing the edit distance between a pair of strings where their distance is bounded by a parameter k ≪ n. We present two streaming algorithms for computing edit distance: One runs in time O(n + k 2 ) and the other n + O(k 3 ). By writing n + O(k 3 ) we want to emphasize that the number of operations per an input symbol is a small constant. In particular, the running time does not depend on the alphabet size, and the algorithm should be easy to implement.Previously a streaming algorithm with running time O(n + k 4 ) was given in the paper by the current authors (STOC'16). The best off-line algorithm runs in time O(n + k 2 ) (Landau et al., 1998) which is known to be optimal under the Strong Exponential Time Hypothesis. ogy, pattern recognition, text processing, information retrieval and many more. The edit distance between x and y, denoted by ∆(x, y), is defined as the minimum number of character insertions, deletions, and substitutions needed for converting x into y. Due to its immense applicability, the computational problem of computing the edit distance between two given strings x and y ∈ Σ n is of prime interest to researchers in various domains of computer science. Sometimes one also requires that the algorithm finds an alignment of x and y, i.e., a series of edit operations that transform x into y.In this paper we study the problem of computing edit distance of strings when given an a priori upper bound k ≪ n on their distance. This is akin to fixed parameter tractability. Arguably, the case when the edit distance is small relative to the length of the strings is the most interesting as when comparing two strings with respect to their edit distance we are implicitly making an assumption that the strings are similar. If they are not similar the edit distance is uninformative. There are few exceptions to this rule, most notably the reduction of instances of formula satisfiability (SAT) to instances of edit distance of exponentially large strings [BI15] where the edit distance of resulting strings is close to their length. However, such instance of the edit distance problem are rather artificial. For typical applications the edit distance of the two strings is much smaller then the length of the strings. Consider for example copying DNA during cell division: Human DNA is essentially a string of about 10 9 letters from {A, C, G, T }, and due to imperfections in the copying mechanism one can expect about 50 edit operations to occur during the process. So in many applications we can be looking for a handful of edit operations in large strings.Landau et al.[LMS98] provided an algorithm that runs in time O(n + k 2 ) and uses space O(n) when size of the alphabet Σ is constant. In general the running time of the algorithm given in [LMS98] is O(n · mi...

show abstract

Approximating Edit Distance within Constant Factor in Truly Sub-Quadratic Time

Chakraborty

Das

Goldenberg

et al. 2018

View full text Add to dashboard Cite

Edit distance is a measure of similarity of two strings based on the minimum number of character insertions, deletions, and substitutions required to transform one string into the other. The edit distance can be computed exactly using a dynamic programming algorithm that runs in quadratic time. Andoni, Krauthgamer and Onak (2010) gave a nearly linear time algorithm that approximates edit distance within approximation factor poly(log n).In this paper, we provide an algorithm with running time O(n 2−2/7 ) that approximates the edit distance within a constant factor.

show abstract

Tight cell probe bounds for succinct Boolean matrix-vector multiplication

Chakraborty

Kamma

Larsen

2018

View full text Add to dashboard Cite

The conjectured hardness of Boolean matrix-vector multiplication has been used with great success to prove conditional lower bounds for numerous important data structure problems, see Henzinger et al. [STOC'15]. In recent work, Larsen and Williams [SODA'17] attacked the problem from the upper bound side and gave a surprising cell probe data structure (that is, we only charge for memory accesses, while computation is free). Their cell probe data structure answers queries inÕ(n 7/4 ) time and is succinct in the sense that it stores the input matrix in read-only memory, plus an additionalÕ(n 7/4 ) bits on the side. In this paper, we essentially settle the cell probe complexity of succinct Boolean matrix-vector multiplication. We present a new cell probe data structure with query timeÕ(n 3/2 ) storing justÕ(n 3/2 ) bits on the side. We then complement our data structure with a lower bound showing that any data structure storing r bits on the side, with n < r < n 2 must have query time t satisfying tr =Ω(n 3 ). For r ≤ n, any data structure must have t =Ω(n 2 ). Since lower bounds in the cell probe model also apply to classic word-RAM data structures, the lower bounds naturally carry over. We also prove similar lower bounds for matrix-vector multiplication over

show abstract

Simultaneous Time-Space Upper Bounds for Red-Blue Path Problem in Planar DAGs

Chakraborty

Tewari

2015

View full text Add to dashboard Cite

In this paper, we show that given a weighted, directed planar graph G, and any > 0, there exists a polynomial time and O(n 1 2 + ) space algorithm that computes the shortest path between two fixed vertices in G.We also consider the RedBluePath problem, which states that given a graph G whose edges are colored either red or blue and two fixed vertices s and t in G, is there a path from s to t in G that alternates between red and blue edges. The RedBluePath problem in planar DAGs is NL-complete. We exhibit a polynomial time and O(n 1 2 + ) space algorithm (for any > 0) for the RedBluePath problem in planar DAG.In the last part of this paper, we consider the problem of deciding and constructing the perfect matching present in a planar bipartite graph and also a similar problem which is to find a Hall-obstacle in a planar bipartite graph. We show the time-space bound of these two problems are same as the bound of shortest path problem in a directed planar graph.

show abstract

An $$O(n^{\epsilon })$$ Space and Polynomial Time Algorithm for Reachability in Directed Layered Planar Graphs

Chakraborty

Tewari

2015

View full text Add to dashboard Cite

Given a graph G and two vertices s and t in it, graph reachability is the problem of checking whether there exists a path from s to t in G. We show that reachability in directed layered planar graphs can be decided in polynomial time and O(n ǫ ) space, for any ǫ > 0. The previous best known space bound for this problem with polynomial time was approximately O( √ n) space. Deciding graph reachability in SC is an important open question in complexity theory and in this paper we make progress towards resolving this question.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.