2013
DOI: 10.14778/2556549.2556569
|View full text |Cite
|
Sign up to set email alerts
|

Counting and sampling triangles from a graph stream

Abstract: This paper presents a new space-efficient algorithm for counting and sampling triangles-and more generally, constant-sized cliques-in a massive graph whose edges arrive as a stream. Compared to prior work, our algorithm yields significant improvements in the space and time complexity for these fundamental problems. Our algorithm is simple to implement and has very good practical performance on large graphs.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
165
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 164 publications
(166 citation statements)
references
References 21 publications
0
165
0
Order By: Relevance
“…Follow-up work includes speeding up reservoir sampling [13], weighted reservoir sampling [14], sampling over a sliding window and stream evolution [15], [16], [17], [18], [19]. Stream sampling has been used extensively in large scale data mining applications, see for example [20], [21], [22], [23]. Stream sampling under sliding windows has been considered in [17], [24].…”
Section: Related Workmentioning
confidence: 99%
“…Follow-up work includes speeding up reservoir sampling [13], weighted reservoir sampling [14], sampling over a sliding window and stream evolution [15], [16], [17], [18], [19]. Stream sampling has been used extensively in large scale data mining applications, see for example [20], [21], [22], [23]. Stream sampling under sliding windows has been considered in [17], [24].…”
Section: Related Workmentioning
confidence: 99%
“…More precisely [BOV13] have shown that one extra pass yields an algorithm that distinguishes between triangle-free graphs from graphs with at least T triangles using O( m T 1/3 ) words of space. Although their algorithm does not give an estimate of the number of triangles and more important is not clearly superior to the O( m∆ T ) one pass algorithm by [PT12,PTTW13] (especially for graphs with small maximum degree ∆), it creates some hope that perhaps with the expense of extra passes one could get improved and cleaner space 1 In this and prior works, some assumption on the number of triangles is required. This is due in part to the fact that distinguishing triangle-free graphs from those with one or more triangle requires space proportional to the number of edges.…”
Section: Introductionmentioning
confidence: 97%
“…To obtain an accurate estimate of the number of triangles in the graph, this procedure is repeated independently O( mn ε 2 T ) times to achieve relative error. Recent work by Pavan et al [PTTW13] extends the sampling approach of Buriol et al: instead of picking a random node to complete the triangle with a sampled edge, their estimator samples a second edge that is incident on the first sampled edge. This estimator is repeated O( m∆ ε 2 T ) times, where ∆ represents the maximum degree of any node.…”
mentioning
confidence: 99%
“…For incidence list streams this is easy since we can assume that the stream consists of (an implicit representation of) all 2-paths [9]. For the more difficult model of adjacency streams where edges arrive in arbitrary order the approach was adjusted such that we sample a random 2-path [13,23]. The one-pass algorithm with the best known space complexity and constant processing time per edge for adjacency streams is due to Pavan et al [23], and when several passes are allowed -by Kolountzakis et al [16].…”
Section: Introductionmentioning
confidence: 99%
“…For the more difficult model of adjacency streams where edges arrive in arbitrary order the approach was adjusted such that we sample a random 2-path [13,23]. The one-pass algorithm with the best known space complexity and constant processing time per edge for adjacency streams is due to Pavan et al [23], and when several passes are allowed -by Kolountzakis et al [16]. For a more detailed overview of results and developed techniques we refer to [28].…”
Section: Introductionmentioning
confidence: 99%