Andrew McGregor scite author profile

N.B. This is the full version of the conference paper published as [12]. This version includes an Appendix with proofs and additional results, and corrects a few typographical errors discovered after publication. It also adds an improvement in the error bounds achieved under ( , δ)-differential privacy, included as Theorem 5. ABSTRACTDifferential privacy is a robust privacy standard that has been successfully applied to a range of data analysis tasks. But despite much recent work, optimal strategies for answering a collection of related queries are not known.We propose the matrix mechanism, a new algorithm for answering a workload of predicate counting queries. Given a workload, the mechanism requests answers to a different set of queries, called a query strategy, which are answered using the standard Laplace mechanism. Noisy answers to the workload queries are then derived from the noisy answers to the strategy queries. This two stage process can result in a more complex correlated noise distribution that preserves differential privacy but increases accuracy.We provide a formal analysis of the error of query answers produced by the mechanism and investigate the problem of computing the optimal query strategy in support of a given workload. We show this problem can be formulated as a rank-constrained semidefinite program. Finally, we analyze two seemingly distinct techniques, whose similar behavior is explained by viewing them as instances of the matrix mechanism.

show abstract

On Graph Problems in a Semi-streaming Model

Feigenbaum¹,

Kannan²,

McGregor³

et al. 2004

150

247

View full text Add to dashboard Cite

We formalize a potentially rich new streaming model, the semi-streaming model, that we believe is necessary for the fruitful study of efficient algorithms for solving problems on massive graphs whose edge sets cannot be stored in memory. In this model, the input graph, G = (V,E), is presented as a stream of edges (in adversarial order), and the storage space of an algorithm is bounded by O(n · polylog n), where n = |V |. We are particularly interested in algorithms that use only one pass over the input, but, for problems where this is provably insufficient, we also look at algorithms using constant or, in some cases, logarithmically many passes. In the course of this general study, we give semi-streaming constant approximation algorithms for the unweighted and weighted matching problems, along with a further algorithm improvement for the bipartite case. We also exhibit log n/ log log n semistreaming approximations to the diameter and the problem of computing the distance between specified vertices in a weighted graph. These are complemented by Ω(log (1−ε) n) lower bounds. Abstract. We formalize a potentially rich new streaming model, the semi-streaming model, that we believe is necessary for the fruitful study of efficient algorithms for solving problems on massive graphs whose edge sets cannot be stored in memory. In this model, the input graph, G = (V, E), is presented as a stream of edges (in adversarial order), and the storage space of an algorithm is bounded by O(n · polylog n), where n = |V |. We are particularly interested in algorithms that use only one pass over the input, but, for problems where this is provably insufficient, we also look at algorithms using constant or, in some cases, logarithmically many passes. In the course of this general study, we give semi-streaming constant approximation algorithms for the unweighted and weighted matching problems, along with a further algorithm improvement for the bipartite case. We also exhibit log n/ log log n semistreaming approximations to the diameter and the problem of computing the distance between specified vertices in a weighted graph. These are complemented by Ω(log (1− ) n) lower bounds.

show abstract

Analyzing Graph Structure via Linear Measurements

Ahn¹,

Guha²,

McGregor³

2012

131

238

View full text Add to dashboard Cite

We initiate the study of graph sketching, i.e., algorithms that use a limited number of linear measurements of a graph to determine the properties of the graph. While a graph on n nodes is essentially O(n 2 )-dimensional, we show the existence of a distribution over random projections into d-dimensional "sketch" space (d n 2 ) such that the relevant properties of the original graph can be inferred from the sketch with high probability. Specifically, we show that:including connectivity, k-connectivity, bipartiteness, and to return any constant approximation of the weight of the minimum spanning tree.2. d = O(n 1+γ ) suffices to compute graph sparsifiers, the exact MST, and approximate the maximum weighted matchings if we permit O(1/γ)-round adaptive sketches, i.e., a sequence of projections where each projection may be chosen dependent on the outcome of earlier sketches.Our results have two main applications, both of which have the potential to give rise to fruitful lines of further research. First, our results can be thought of as giving the first compressed-sensing style algorithms for graph data. Secondly, our work initiates the study of dynamic graph streams. There is already extensive literature on processing massive graphs in the data-stream model. However, the existing work focuses on graphs defined by a sequence of inserted edges and does not consider edge deletions. We think this is a curious omission given the existing work on both dynamic graphs in the non-streaming setting and dynamic geometric streaming. Our results include the first dynamic graph semi-streaming algorithms for connectivity, spanning trees, sparsification, and matching problems.

show abstract

Graph stream algorithms

McGregor¹

2014

SIGMOD Rec.

327

226

View full text Add to dashboard Cite

Over the last decade, there has been considerable interest in designing algorithms for processing massive graphs in the data stream model. The original motivation was two-fold: a) in many applications, the dynamic graphs that arise are too large to be stored in the main memory of a single machine and b) considering graph problems yields new insights into the complexity of stream computation. However, the techniques developed in this area are now finding applications in other areas including data structures for dynamic graphs, approximation algorithms, and distributed and parallel computation. We survey the state-of-the-art results; identify general techniques; and highlight some simple algorithms that illustrate basic ideas.

show abstract

Finding Graph Matchings in Data Streams

McGregor

2005

151

215

View full text Add to dashboard Cite

Abstract. We present algorithms for finding large graph matchings in the streaming model. In this model, applicable when dealing with massive graphs, edges are streamed-in in some arbitrary order rather than residing in randomly accessible memory. For ǫ > 0, we achieve a 1 1+ǫ approximation for maximum cardinality matching and a 1 2+ǫapproximation to maximum weighted matching. Both algorithms use a constant number of passes andÕ(|V |) space.

show abstract

Graph Distances in the Data-Stream Model

Feigenbaum¹,

Kannan²,

McGregor³

et al. 2009

SIAM J. Comput.

134

180

View full text Add to dashboard Cite

Abstract. We explore problems related to computing graph distances in the data-stream model. The goal is to design algorithms that can process the edges of a graph in an arbitrary order given only a limited amount of working memory. We are motivated by both the practical challenge of processing massive graphs such as the web graph and the desire for a better theoretical understanding of the datastream model. In particular, we are interested in the trade-offs between model parameters such as perdata-item processing time, total space, and the number of passes that may be taken over the stream. These trade-offs are more apparent when considering graph problems than they were in previous streaming work that solved problems of a statistical nature. Our results include the following: (1) Spanner construction: There exists a single-pass,Õ(tn 1+1/t )-space,Õ(t 2 n 1/t )-time-per-edge algorithm that constructs a (2t + 1)-spanner. For t = Ω(log n/log log n), the algorithm satisfies the semistreaming space restriction of O(n polylog n) and has per-edge processing time O(polylog n).This resolves an open question from [J. Feigenbaum et al., Theoret. Comput. Sci., 348 (2005), pp. 207-216]. (2) Breadth-first-search (BFS) trees: For any even constant k, we show that any algorithm that computes the first k layers of a BFS tree from a prescribed node with probability at least 2/3 requires either greater than k/2 passes orΩ(n 1+1/k ) space. Since constructing BFS trees is an important subroutine in many traditional graph algorithms, this demonstrates the need for new algorithmic techniques when processing graphs in the data-stream model. (3) Graph-distance lower bounds: Any t-approximation of the distance between two nodes requires Ω(n 1+1/t ) space. We also prove lower bounds for determining the length of the shortest cycle and other graph properties. (4) Techniques for decreasing per-edge processing: We discuss two general techniques for speeding up the per-edge computation time of streaming algorithms while increasing the space by only a small factor.

show abstract

On graph problems in a semi-streaming model

Feigenbaum

Kannan

McGregor

et al. 2005

Theoretical Computer Science

258

150

View full text Add to dashboard Cite

show abstract

Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams

Chitnis¹,

Cormode²,

Esfandiari³

et al. 2015

132

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Andrew McGregor

Optimizing linear counting queries under differential privacy

On Graph Problems in a Semi-streaming Model

Analyzing Graph Structure via Linear Measurements

Graph stream algorithms

Finding Graph Matchings in Data Streams

Graph Distances in the Data-Stream Model

On graph problems in a semi-streaming model

Kernelization via Sampling with Applications to Finding Matchings and Related Problems in Dynamic Graph Streams

Contact Info

Product

Resources

About