2016
DOI: 10.1007/s10115-016-0958-4
|View full text |Cite
|
Sign up to set email alerts
|

DASC: data aware algorithm for scalable clustering

Abstract: Emergence of MapReduce (MR) framework for scaling data mining and machine learning algorithms provides for Volume, while handling of Variety and Velocity needs to be skilfully crafted in algorithms. So far, scalable clustering algorithms have focused solely on Volume, taking advantage of the MR framework. In this paper we present a MapReduce algorithm-data aware scalable clustering (DASC), which is capable of handling the 3 Vs of big data by virtue of being (i) single scan and distributed to handle Volume, (ii… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 21 publications
0
1
0
Order By: Relevance
“…A semi-supervised clustering method is considered in case of non-dominant attributes are more important for clustering results than the dominant ones. Bhatnagar et al (2017) proposed a data-aware scalable clustering (DASC) algorithm, which is an incremental algorithm inspired by a grid-based stream clustering algorithm called the ExCC, proposed by Bhatnagar, Kaur and Chakravarthy (2013). This method is able to handle three V's of big data, working with: i) distributed and large files (volume); ii) data stream (velocity), and iii) mixed-type attributes (variety).…”
Section: Discussionmentioning
confidence: 99%
“…A semi-supervised clustering method is considered in case of non-dominant attributes are more important for clustering results than the dominant ones. Bhatnagar et al (2017) proposed a data-aware scalable clustering (DASC) algorithm, which is an incremental algorithm inspired by a grid-based stream clustering algorithm called the ExCC, proposed by Bhatnagar, Kaur and Chakravarthy (2013). This method is able to handle three V's of big data, working with: i) distributed and large files (volume); ii) data stream (velocity), and iii) mixed-type attributes (variety).…”
Section: Discussionmentioning
confidence: 99%