2007
DOI: 10.1002/asi.20596
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting parallelism to support scalable hierarchical clustering

Abstract: A distributed memory parallel version of the group average hierarchical agglomerative clustering algorithm is proposed to enable scaling the document clustering problem to large collections. Using standard message passing operations reduces interprocess communication while maintaining efficient load balancing. In a series of experiments using a subset of a standard Text REtrieval Conference (TREC) test collection, our parallel hierarchical clustering algorithm is shown to be scalable in terms of processors eff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
7
0

Year Published

2009
2009
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 43 publications
0
7
0
Order By: Relevance
“…Several efforts were taken to parallelize hierarchical clustering algorithm, relying on the advance of modern computer architectures and large-scale systems. Different platforms, including multi-core [8], GPU [24], MPI [25] as well as recently popularized MapReduce framework [26], [27], have all seen its implementation.…”
Section: Related Workmentioning
confidence: 99%
“…Several efforts were taken to parallelize hierarchical clustering algorithm, relying on the advance of modern computer architectures and large-scale systems. Different platforms, including multi-core [8], GPU [24], MPI [25] as well as recently popularized MapReduce framework [26], [27], have all seen its implementation.…”
Section: Related Workmentioning
confidence: 99%
“…Because of its quadratic computational complexity, hierarchical clustering algorithms are unpractical for large document collections. A number of algorithms that exploit parallelism in hierarchical clustering algorithms have been introduced in literature [3,4,5,7,8,9,10]. A parallel hierarchical clustering algorithm is introduced in [3] which is used as the clustering subroutine for a parallel buckshot algorithm.…”
Section: Related Workmentioning
confidence: 99%
“…A number of algorithms that exploit parallelism in hierarchical clustering algorithms have been introduced in literature [3,4,5,7,8,9,10]. A parallel hierarchical clustering algorithm is introduced in [3] which is used as the clustering subroutine for a parallel buckshot algorithm. A distributed clustering technique (RACHET) is developed [9] in which hierarchical clustering algorithms are used to generate local dendrograms.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations