2000
DOI: 10.1007/3-540-45372-5_24
|View full text |Cite
|
Sign up to set email alerts
|

Accurate Recasting of Parameter Estimation Algorithms Using Sufficient Statistics for Efficient Parallel Speed-Up

Abstract: Fueled by advances in computer technology and online business, data collection is rapidly accelerating, as well as the importance of its analysis-data mining. Increasing database sizes strain the scalability of many data mining algorithms. Data clustering is one of the fundamental techniques in data mining solutions. The many clustering algorithms developed face new challenges with growing data sets. Algorithms with quadratic or higher computational complexity, such as agglomerative algorithms, drop out quickl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
17
0

Year Published

2000
2000
2013
2013

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 14 publications
(17 citation statements)
references
References 11 publications
0
17
0
Order By: Relevance
“…In our previous paper [ZHF00], we described a parallel decomposition for center-based clustering algorithms that limits inter-processor communication to sufficient statistics only, reducing the network bottleneck. The data set is partitioned randomly across the memory of the processors and does not need to be transferred between iterations.…”
Section: II Backgroundmentioning
confidence: 99%
See 1 more Smart Citation
“…In our previous paper [ZHF00], we described a parallel decomposition for center-based clustering algorithms that limits inter-processor communication to sufficient statistics only, reducing the network bottleneck. The data set is partitioned randomly across the memory of the processors and does not need to be transferred between iterations.…”
Section: II Backgroundmentioning
confidence: 99%
“…In a companion paper [ZHF00], we developed a class of parallel iterative parameter estimation algorithms, covering the centerbased clustering algorithms K-Means [M67] [GG92], KHarmonic Means [ZHD00a] [Z00b], and EM [DLR77] [MK97]. The parallelization is resource efficient and operates without approximation to the original sequential algorithms.…”
Section: Introductionmentioning
confidence: 99%
“…In this section, we will present a brief overview of some VQ techniques, both serial (LBG [29], Kmeans [31] and ELBG [36]) and parallel (PKM [38,39,1,9,43,11,34,33], PARELBG [34,37], P-CLUSTER [23][24][25] and PAUL [4]), selected from the large existing literature.…”
Section: Previous Workmentioning
confidence: 99%
“…Various hardware architectures have been employed such as, for example: specialized architectures [33], massively parallel processors [38], transputers [39,1] and networks of workstations [9,43,11,34]. The idea at the basis of such techniques is the subdivision of the most timeconsuming part of the algorithm (the calculation of the Voronoi partition) into a certain number of subtasks to be executed in parallel, while, the remaining operations (the calculation of the new centroids) are serially executed by a single process.…”
Section: Parallel K-means (Pkm): a Family Of Pisa Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation