2014
DOI: 10.14257/ijsip.2014.7.2.13
|View full text |Cite
|
Sign up to set email alerts
|

A New Data Mining Algorithm based on MapReduce and Hadoop

Abstract: Abstract

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 11 publications
0
5
0
Order By: Relevance
“…Therefore, people have proposed various approximations to PAM, such as CLARA and CLARANS discussed before. Yang and Lian (2014) parallelize the "k-means like" variant with map-reduce, parallelizing over the cluster in the reduce step. When cluster sizes vary substantially, this needs O(n 2 ) memory in the reducer, and may yield next to no speedup in the worst case.…”
Section: Variants Of Pammentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, people have proposed various approximations to PAM, such as CLARA and CLARANS discussed before. Yang and Lian (2014) parallelize the "k-means like" variant with map-reduce, parallelizing over the cluster in the reduce step. When cluster sizes vary substantially, this needs O(n 2 ) memory in the reducer, and may yield next to no speedup in the worst case.…”
Section: Variants Of Pammentioning
confidence: 99%
“…Nevertheless, a few seminal methods such as hierarchical clustering, k-means, PAM Rousseeuw, 1987, 1990c), and DBSCAN (Ester et al, 1996) have received repeated and widespread use. One may be tempted to think that these classic methods have all been well researched and understood, but there are still many scientific publications trying to explain these algorithms better (e.g., Schubert et al, 2017), trying to parallelize and scale them to larger data sets (e.g., Lijffijt et al, 2015;Yang and Lian, 2014), trying to better understand similarities and relationships among the published methods (e.g., , or proposing further improvements -and so does this paper for the widely used PAM algorithm, also often referred to as k-medoids clustering.…”
Section: Introductionmentioning
confidence: 99%
“…To analyze a lot of data with enough resources, we need to make clustering methods take less time and use less memory. The authors in [15] adapted MapReduce to medoids. During mapping, it places each object next to its closest medoid, and during reduction, it moves the real medoid to the center of the group.…”
Section: Related Workmentioning
confidence: 99%
“…It is already proved that the inverse matrix of {I -(1+d) -1 T} -1 exists [30]. CLV i is (mn+2)1 column matrix whose j th element denotes the cumulative profitability generated by customer i while he or she remains at the state.…”
Section: Figure 3 One-step Transition Matrixmentioning
confidence: 99%
“…Data mining is to discover hidden useful information in large databases. Mining frequent patterns from transaction databases is an important problem in data mining [30]. Figure 4 shows the data mining procedure to predict various transition probabilities in the case study.…”
Section: Decision Variables Of Individual CLVmentioning
confidence: 99%