2010
DOI: 10.18637/jss.v035.i05
|View full text |Cite
|
Sign up to set email alerts
|

rEMM: Extensible Markov Model for Data Stream Clustering inR

Abstract: Clustering streams of continuously arriving data has become an important application of data mining in recent years and efficient algorithms have been proposed by several researchers. However, clustering alone neglects the fact that data in a data stream is not only characterized by the proximity of data points which is used by clustering, but also by a temporal component. The extensible Markov model (EMM) adds the temporal component to data stream clustering by superimposing a dynamically adapting Markov chai… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
8
0

Year Published

2012
2012
2019
2019

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 35 publications
0
8
0
Order By: Relevance
“…For the second case, a typical example is the Temporal Structure Learning for Clustering Massive Data Stream in Real Time (TRACDS ) algorithm [Hahsler and Dunham 2011]. It is essentially a generalization of the Extensible Markov Model (EMM) algorithm [Dunham et al 2004;Hahsler and Dunham 2010] for data stream scenarios. In TRACDS, each cluster (or micro-cluster) is represented by a state of a Markov Chain (MC) [Markov 1971;Bhat and Miller 2002], and the transitions represent the relationship between clusters.…”
Section: Time-aware Clusteringmentioning
confidence: 99%
“…For the second case, a typical example is the Temporal Structure Learning for Clustering Massive Data Stream in Real Time (TRACDS ) algorithm [Hahsler and Dunham 2011]. It is essentially a generalization of the Extensible Markov Model (EMM) algorithm [Dunham et al 2004;Hahsler and Dunham 2010] for data stream scenarios. In TRACDS, each cluster (or micro-cluster) is represented by a state of a Markov Chain (MC) [Markov 1971;Bhat and Miller 2002], and the transitions represent the relationship between clusters.…”
Section: Time-aware Clusteringmentioning
confidence: 99%
“…This count is incremented by one when the cluster is being assigned with a new observation. Cluster fading is represented by applying fading weights on the clusters observation counts (in this case we call them weighted observations counts) like in [17].…”
Section: A Implementation Requirementsmentioning
confidence: 99%
“…In our study, we use the GNU R implementation of the data stream clustering approach from [17]. It is a simple implementation of the threshold nearest neighbor algorithm.…”
Section: Data Stream Clustering Implementationmentioning
confidence: 99%
See 1 more Smart Citation
“…This package has methods to quickly load a large set of sequence files, that can be in FASTA format with Greengenes [ 22 ] annotations, into a relational database and can be used to easily filter sequences belonging to any taxonomic rank. This package is built on top of a number of other packages including Biostrings [ 29 ] for handling sequences, and the data stream clustering package rEMM [ 30 , 31 ]. It provides a complete interface for managing sequences, creating word frequencies distributions (NSVs) and creating and analyzing GenModels.…”
Section: Quasi-alignment Via Position-sensitive P mentioning
confidence: 99%