2009 IEEE International Symposium on Parallel &Amp; Distributed Processing 2009
DOI: 10.1109/ipdps.2009.5161049
|View full text |Cite
|
Sign up to set email alerts
|

Multi-dimensional characterization of temporal data mining on graphics processors

Abstract: Through the algorthmic design patterns of data parallelism and task parallelism, the graphics processing unit (GPU) offers the potential to vastly accelerate discovery and innovation across a multitude of disciplines. For example, the exponential growth in data volume now presents an obstacle for high-throughput data mining in fields such as neuroinformatics and bioinformatics. As such, we present a characterization of a MapReduce-based datamining application on a general-purpose GPU (GPGPU). Using neuroscienc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2010
2010
2014
2014

Publication Types

Select...
3
3

Relationship

5
1

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 11 publications
0
6
0
Order By: Relevance
“…In contrast, the world’s fastest supercomputer, Roadrunner, has a peak of 1,457 teraflops at a cost of $133M for a mere performance-price ratio of 11 megaflops per dollar and performance-space ratio of 243 teraflops per square foot. However, the current programming model for GPUs is only amenable to highly data-parallel applications; efficient GPU mappings for less data-parallel applications are extraordinarily difficult to realize37. Unlike supercomputer clusters consisting of general-purpose processors and direct support for interprocessor communication, the GPU has limited interprocessor communication capabilities and limited data cache.…”
Section: Introductionmentioning
confidence: 99%
“…In contrast, the world’s fastest supercomputer, Roadrunner, has a peak of 1,457 teraflops at a cost of $133M for a mere performance-price ratio of 11 megaflops per dollar and performance-space ratio of 243 teraflops per square foot. However, the current programming model for GPUs is only amenable to highly data-parallel applications; efficient GPU mappings for less data-parallel applications are extraordinarily difficult to realize37. Unlike supercomputer clusters consisting of general-purpose processors and direct support for interprocessor communication, the GPU has limited interprocessor communication capabilities and limited data cache.…”
Section: Introductionmentioning
confidence: 99%
“…This empirical optimization with varying configurations is a typical approach in GPU programming because a general performance prediction model for a GPU architecture is not available due to the complexity of its parallel programming model [2,28,29]. Our experiments showed that the thread block with size 16 Â 26 yields the best performance in the block-level facet processing implementation.…”
Section: Thread-block Configurationmentioning
confidence: 91%
“…Even within a given GPU architecture and vendor, Archuleta et al [10] show that different GPUs react differently to algorithmic and mapping changes. Each case calls the portability of accelerator performance into question.…”
Section: Early Hardware Asymmetrymentioning
confidence: 99%