2003
DOI: 10.1109/tkde.2003.1198387
|View full text |Cite
|
Sign up to set email alerts
|

Clustering data streams: theory and practice

Abstract: The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
391
0
29

Year Published

2008
2008
2018
2018

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 706 publications
(431 citation statements)
references
References 72 publications
2
391
0
29
Order By: Relevance
“…no knowledge of the number of clusters), (2) tracking cluster evolutions, (3) speed and complexity of computation (including highdimenstionality) and (4) outliers, to name a few. There are several papers that review these challenges and some of the algorithms designed to tackle them (Yogita and Toshniwal, 2012;Khalilian and Mustapha, 2010;Guha et al, 2003).…”
Section: Clustering Algorithmsmentioning
confidence: 99%
“…no knowledge of the number of clusters), (2) tracking cluster evolutions, (3) speed and complexity of computation (including highdimenstionality) and (4) outliers, to name a few. There are several papers that review these challenges and some of the algorithms designed to tackle them (Yogita and Toshniwal, 2012;Khalilian and Mustapha, 2010;Guha et al, 2003).…”
Section: Clustering Algorithmsmentioning
confidence: 99%
“…We derive the idea of dividing data into chunks and working on each of them separately by the work of Guha et al [4]. The easiest way to divide the data is based on the timestamp (creating a new chunk for each interval of time).…”
Section: Dividing Data Into Chunksmentioning
confidence: 99%
“…A data stream is represented by S = {o 1 to represent a unit, only one value, a unit identifier, is used in this paper. For instance, an object o k is transformed to a unit…”
Section: A Cluster Streamingmentioning
confidence: 99%
“…Recently, many data mining methods [1] for a data stream have been actively introduced. A data stream is an ordered sequence of objects o 1 , …, o n that must be accessed in order and that can be read only once or a small specified number of times.…”
Section: Introductionmentioning
confidence: 99%