Counting and sampling triangles from a graph stream

Pavan, A.; Tangwongsan, Kanat; Tirthapura, Srikanta

doi:10.14778/2556549.2556569

Cited by 164 publications

(166 citation statements)

References 21 publications

Supporting

Mentioning

165

Contrasting

Order By: Relevance

“…Follow-up work includes speeding up reservoir sampling [13], weighted reservoir sampling [14], sampling over a sliding window and stream evolution [15], [16], [17], [18], [19]. Stream sampling has been used extensively in large scale data mining applications, see for example [20], [21], [22], [23]. Stream sampling under sliding windows has been considered in [17], [24].…”

Section: Related Workmentioning

confidence: 99%

A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream

Chung

Tirthapura

Woodruff

2016

IEEE Trans. Knowl. Data Eng.

Self Cite

View full text Add to dashboard Cite

We present a simple, message-optimal algorithm for maintaining a random sample from a large data stream whose input elements are distributed across multiple sites that communicate via a central coordinator. At any point in time, the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system. We present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. We also consider the important case when the distribution of elements across different sites is non-uniform, and show that for such inputs, our algorithm significantly outperforms prior solutions. Abstract-We present a simple, message-optimal algorithm for maintaining a random sample from a large data stream whose input elements are distributed across multiple sites that communicate via a central coordinator. At any point in time, the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system. We present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. We also consider the important case when the distribution of elements across different sites is non-uniform, and show that for such inputs, our algorithm significantly outperforms prior solutions. Keywords

show abstract

Section: Related Workmentioning

confidence: 99%

A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream

Chung

Tirthapura

Woodruff

2016

IEEE Trans. Knowl. Data Eng.

Self Cite

View full text Add to dashboard Cite

show abstract

“…More precisely [BOV13] have shown that one extra pass yields an algorithm that distinguishes between triangle-free graphs from graphs with at least T triangles using O( m T 1/3 ) words of space. Although their algorithm does not give an estimate of the number of triangles and more important is not clearly superior to the O( m∆ T ) one pass algorithm by [PT12,PTTW13] (especially for graphs with small maximum degree ∆), it creates some hope that perhaps with the expense of extra passes one could get improved and cleaner space 1 In this and prior works, some assumption on the number of triangles is required. This is due in part to the fact that distinguishing triangle-free graphs from those with one or more triangle requires space proportional to the number of edges.…”

Section: Introductionmentioning

confidence: 97%

“…To obtain an accurate estimate of the number of triangles in the graph, this procedure is repeated independently O( mn ε 2 T ) times to achieve relative error. Recent work by Pavan et al [PTTW13] extends the sampling approach of Buriol et al: instead of picking a random node to complete the triangle with a sampled edge, their estimator samples a second edge that is incident on the first sampled edge. This estimator is repeated O( m∆ ε 2 T ) times, where ∆ represents the maximum degree of any node.…”

mentioning

confidence: 99%

A second look at counting triangles in graph streams (corrected)

Cormode¹,

Jowhari²

2017

Theoretical Computer Science

View full text Add to dashboard Cite

In this paper we present improved results on the problem of counting triangles in edge streamed graphs. For graphs with m edges and at least T triangles, we show that an extra look over the stream yields a two-pass streaming algorithm that uses O(polylog(m)) space and outputs a (1 + ε) approximation of the number of triangles in the graph. This improves upon the two-pass streaming tester of Braverman, Ostrovsky and Vilenchik, ICALP 2013, which distinguishes between triangle-free graphs and graphs with at least T triangle using O( m T 1/3 ) space. Also, in terms of dependence on T , we show that more passes would not lead to a better space bound. In other words, we prove there is no constant pass streaming algorithm that distinguishes between triangle-free graphs from graphs with at least T triangles using O( m T 1/2+ρ ) space for any constant ρ ≥ 0.

show abstract

“…For incidence list streams this is easy since we can assume that the stream consists of (an implicit representation of) all 2-paths [9]. For the more difficult model of adjacency streams where edges arrive in arbitrary order the approach was adjusted such that we sample a random 2-path [13,23]. The one-pass algorithm with the best known space complexity and constant processing time per edge for adjacency streams is due to Pavan et al [23], and when several passes are allowed -by Kolountzakis et al [16].…”

Section: Introductionmentioning

confidence: 99%

“…For the more difficult model of adjacency streams where edges arrive in arbitrary order the approach was adjusted such that we sample a random 2-path [13,23]. The one-pass algorithm with the best known space complexity and constant processing time per edge for adjacency streams is due to Pavan et al [23], and when several passes are allowed -by Kolountzakis et al [16]. For a more detailed overview of results and developed techniques we refer to [28].…”

Section: Introductionmentioning

confidence: 99%

Triangle Counting in Dynamic Graph Streams

Kutzkov

Pagh

2014

Algorithm Theory – SWAT 2014

View full text Add to dashboard Cite

Estimating the number of triangles in graph streams using a limited amount of memory has become a popular topic in the last decade. Different variations of the problem have been studied, depending on whether the graph edges are provided in an arbitrary order or as incidence lists. However, with a few exceptions, the algorithms have considered insert-only streams. We present a new algorithm estimating the number of triangles in dynamic graph streams where edges can be both inserted and deleted. We show that our algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively small number of triangles. Also, for graphs with constant transitivity coefficient, a common situation in real graphs, this is the first algorithm achieving constant processing time per edge. The result is achieved by a novel approach combining sampling of vertex triples and sparsification of the input graph. In the course of the analysis of the algorithm we present a lower bound on the number of pairwise independent 2-paths in general graphs which might be of independent interest. At the end of the paper we discuss lower bounds on the space complexity of triangle counting algorithms that make no assumptions on the structure of the graph.

show abstract

Counting and sampling triangles from a graph stream

Cited by 164 publications

References 21 publications

A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream

A Simple Message-Optimal Algorithm for Random Sampling from a Distributed Stream

A second look at counting triangles in graph streams (corrected)

Triangle Counting in Dynamic Graph Streams

Contact Info

Product

Resources

About