Graph spanners are well-studied and widely used both in theory and practice. In a recent breakthrough, Chechik and Wulff-Nilsen [11] improved the state-of-the-art for light spanners by constructing a (2k − 1)(1 + ε)-spanner with O(n 1+1/k ) edges and Oε(n 1/k ) lightness. Soon after, Filtser and Solomon [19] showed that the classic greedy spanner construction achieves the same bounds. The major drawback of the greedy spanner is its running time of O(mn 1+1/k ) (which is faster than [11]). This makes the construction impractical even for graphs of moderate size. Much faster spanner constructions do exist but they only achieve lightness Ωε(kn 1/k ), even when randomization is used.The contribution of this paper is deterministic spanner constructions that are fast, and achieve similar bounds as the state-of-the-art slower constructions. Our first result is an Oε(n 2+1/k+ε ) time spanner construction which achieves the state-of-the-art bounds. Our second result is an Oε(m + n log n) time construction of a spanner with (2k − 1)(1 + ε) stretch, O(log k • n 1+1/k ) edges and Oε(log k • n 1/k ) lightness. This is an exponential improvement in the dependence on k compared to the previous result with such running time. Finally, for the important special case where k = log n, for every constant ε > 0, we provide an O(m + n 1+ε ) time construction that produces an O(log n)spanner with O(n) edges and O(1) lightness which is asymptotically optimal. This is the first known sub-quadratic construction of such a spanner for any k = ω(1).To achieve our constructions, we show a novel deterministic incremental approximate distance oracle. Our new oracle is crucial in our construction, as known randomized dynamic oracles require the assumption of a non-adaptive adversary. This is a strong assumption, which has seen recent attention in prolific venues. Our new oracle allows the order of the edge insertions to not be fixed in advance, which is critical as our spanner algorithm chooses which edges to insert based on the answers to distance queries. We believe our new oracle is of independent interest.
Min-wise hashing is an important method for estimating the size of the intersection of sets, based on a succinct summary (a "min-hash") independently computed for each set. One application is estimation of the number of data points that satisfy the conjunction of m ≥ 2 simple predicates, where a min-hash is available for the set of points satisfying each predicate. This has applications in query optimization and for approximate computation of COUNT aggregates. In this paper we address the question: How many bits is it necessary to allocate to each summary in order to get an estimate with 1 ± ε relative error? The state-of-the-art technique for minimizing the encoding size, for any desired estimation error, is b-bit min-wise hashing due to Li and König (Communications of the ACM, 2011). We give new lower and upper bounds:• Using information complexity arguments, we show that b-bit min-wise hashing is space optimal for m = 2 predicates in the sense that the estimator's variance is within a constant factor of the smallest possible among all summaries with the given space usage. But for conjunctions of m > 2 predicates we show that the performance of b-bit min-wise hashing (and more generally any method based on "k-permutation" min-hash) deteriorates as m grows.• We describe a new summary that nearly matches our lower bound for m ≥ 2. It asymptotically outperform all k-permutation schemes (by around a factor Ω(m/ log m)), as well as methods based on subsampling (by a factor Ω(log nmax), where nmax is the maximum set size).
Finding cycles in graphs is a fundamental problem in algorithmic graph theory. In this paper, we consider the problem of finding and reporting a cycle of length 2k in an undirected graph G with n nodes and m edges for constant k ě 2. A classic result by Bondy and Simonovits [J. Combinatorial Theory, 1974] We present an algorithm that uses O`m 2k{pk`1q˘t ime and finds a 2k-cycle if one exists. This bound is Opn 2 q exactly when m " Θpn 1`1{k q. When finding 4-cycles our new bound coincides with Alon et. al., while for every k ą 2 our new bound yields a polynomial improvement in m.Yuster and Zwick noted that it is "plausible to conjecture that Opn 2 q is the best possible bound in terms of n". We show "conditional optimality": if this hypothesis holds then our Opm 2k{pk`1q q algorithm is tight as well. Furthermore, a folklore reduction implies that no combinatorial algorithm can determine if a graph contains a 6-cycle in time Opm 3{2´ε q for any ε ą 0 unless boolean matrix multiplication can be solved combinatorially in time Opn 3´ε 1 q for some ε 1 ą 0, which is widely believed to be false. Coupled with our main result, this gives tight bounds for finding 6-cycles combinatorially and also separates the complexity of finding 4-and 6-cycles giving evidence that the exponent of m in the running time should indeed increase with k.The key ingredient in our algorithm is a new notion of capped k-walks, which are walks of length k that visit only nodes according to a fixed ordering. Our main technical contribution is an involved analysis proving several properties of such walks which may be of independent interest.
We consider the problem of multiplying sparse matrices (over a semiring) where the number of non-zero entries is larger than main memory. In the classical paper of Hong and Kung (STOC '81) it was shown that to compute a product of dense U ×U matrices, Θ U 3 /(B √ M ) I/Os are necessary and sufficient in the I/O model with internal memory size M and memory block size B. In this paper we generalize the upper and lower bounds of Hong and Kung to the sparse case. Our bounds depend of the number N = nnz(A)+ nnz(C) of nonzero entries in A and C, as well as the number Z = nnz(AC) of nonzero entries in AC. We show that AC can be computed usingÕ N B min Z M , N M I/Os, with high probability. This is tight (up to polylogarithmic factors) when only semiring operations are allowed, even for dense rectangular matrices: We show a lower bound of Ω N B min Z M , N M I/Os.While our lower bound uses fairly standard techniques, the upper bound makes use of "compressed matrix multiplication" sketches, which is new in the context of I/O-efficient algorithms, and a new matrix product size estimation technique that avoids the "no cancellation" assumption.
Abstract. We present an I/O-efficient algorithm for computing similarity joins based on locality-sensitive hashing (LSH). In contrast to the filtering methods commonly suggested our method has provable subquadratic dependency on the data size. Further, in contrast to straightforward implementations of known LSH-based algorithms on external memory, our approach is able to take significant advantage of the available internal memory: Whereas the time complexity of classical algorithms includes a factor of N ρ , where ρ is a parameter of the LSH used, the I/O complexity of our algorithm merely includes a factor (N/M ) ρ , where N is the data size and M is the size of internal memory. Our algorithm is randomized and outputs the correct result with high probability. It is a simple, recursive, cache-oblivious procedure, and we believe that it will be useful also in other computational settings such as parallel computation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.