Multi-dimensional characterization of temporal data mining on graphics processors

Archuleta, Jeremy; Cao, Yong; Scogland, Thomas R. W.; Feng, Wu-chun

doi:10.1109/ipdps.2009.5161049

Cited by 9 publications

(6 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In contrast, the world’s fastest supercomputer, Roadrunner, has a peak of 1,457 teraflops at a cost of $133M for a mere performance-price ratio of 11 megaflops per dollar and performance-space ratio of 243 teraflops per square foot. However, the current programming model for GPUs is only amenable to highly data-parallel applications; efficient GPU mappings for less data-parallel applications are extraordinarily difficult to realize37. Unlike supercomputer clusters consisting of general-purpose processors and direct support for interprocessor communication, the GPU has limited interprocessor communication capabilities and limited data cache.…”

Section: Introductionmentioning

confidence: 99%

Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units

Anandakrishnan

Scogland

Fenley

et al. 2010

Journal of Molecular Graphics and Modelling

Self Cite

View full text Add to dashboard Cite

Tools that compute and visualize biomolecular electrostatic surface potential have been used extensively for studying biomolecular function. However, determining the surface potential for large biomolecules on a typical desktop computer can take days or longer using currently available tools and methods. Two commonly used techniques to speed up these types of electrostatic computations are approximations based on multi-scale coarse-graining and parallelization across multiple processors. This paper demonstrates that for the computation of electrostatic surface potential, these two techniques can be combined to deliver significantly greater speed-up than either one separately, something that is in general not always possible. Specifically, the electrostatic potential computation, using an analytical linearized Poisson Boltzmann (ALPB) method, is approximated using the hierarchical charge partitioning (HCP) multiscale method, and parallelized on an ATI Radeon 4870 graphical processing unit (GPU). The implementation delivers a combined 934-fold speed-up for a 476,040 atom viral capsid, compared to an equivalent non-parallel implementation on an Intel E6550 CPU without the approximation. This speed-up is significantly greater than the 42-fold speed-up for the HCP approximation alone or the 182-fold speed-up for the GPU alone.

show abstract

Section: Introductionmentioning

confidence: 99%

Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units

Anandakrishnan

Scogland

Fenley

et al. 2010

Journal of Molecular Graphics and Modelling

Self Cite

View full text Add to dashboard Cite

show abstract

“…This empirical optimization with varying configurations is a typical approach in GPU programming because a general performance prediction model for a GPU architecture is not available due to the complexity of its parallel programming model [2,28,29]. Our experiments showed that the thread block with size 16 Â 26 yields the best performance in the block-level facet processing implementation.…”

Section: Thread-block Configurationmentioning

confidence: 91%

Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling

Park

Cao

Watson

et al. 2012

J Real-Time Image Proc

Self Cite

View full text Add to dashboard Cite

Modern graphics processing units (GPUs) are commodity data-parallel coprocessors capable of high performance computation and data throughput. It is well known that the GPUs are ideal implementation platforms for image processing applications. However, the level of efforts and expertise to optimize the application performance is still substantial. This paper investigates the computation-to-core mapping strategies to probe the efficiency and scalability of the robust facet image modeling algorithm on GPUs. Our fine-grained computation-to-core mapping scheme achieves a significant performance gain over the standard pixel-wise mapping scheme. With indepth performance comparisons across the two different mapping schemes, we analyze the impact of the level of parallelism on the GPU computation and suggest two principles for optimizing future image processing applications on the GPU platform.

show abstract

“…Even within a given GPU architecture and vendor, Archuleta et al [10] show that different GPUs react differently to algorithmic and mapping changes. Each case calls the portability of accelerator performance into question.…”

Section: Early Hardware Asymmetrymentioning

confidence: 99%

Runtime Adaptation for Autonomic Heterogeneous Computing

Scogland

Feng

2014

2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Self Cite

View full text Add to dashboard Cite

Heterogeneity is increasing across all levels of computing, with the rise of accelerators such as GPUs, FPGAs, and other coprocessors into everything from cell phones to supercomputers. More quietly it is increasing with the rise of NUMA systems, hierarchical caching, OS noise, and a myriad of other factors. As heterogeneity becomes a fact of life, efficiently managing heterogeneous compute resources is becoming a critical, and ever more complex, task. The focus of this dissertation is to lay the foundation for an autonomic system for heterogeneous computing, employing runtime adaptation to improve performance portability and performance consistency while maintaining or increasing programmability. We investigate heterogeneity arising from a myriad of factors, grouped into the dimensions of locality and capability. This work has resulted in runtime schedulers capable of automatically detecting and mitigating heterogeneity in physically homogeneous systems through MPI and adaptive coscheduling for physically heterogeneous accelerator based systems as well as a synthesis of the two to address multiple levels of heterogeneity as a coherent whole. We also discuss our current work towards the next generation of fine-grained scheduling and synchronization across heterogeneous platforms in the design of a highly-scalable and portable concurrent queue for many-core systems. Each component addresses aspects of the urgent need for automated management of the extreme and ever expanding complexity introduced by heterogeneity. I have also had the good fortune to collaborate with both Lawrence Livermore National Laboratory and Argonne National Laboratory. Experiences as an intern at each lab has had a significant effect on the final shape of my dissertation. Special thanks go to Dr. Pavan Balaji (again), Dr. Bronis de Supinski (again) and Dr. Barry Rountree. These collaborations proved to be major turning points for me, both in my research and my life as a whole.

show abstract

Multi-dimensional characterization of temporal data mining on graphics processors

Cited by 9 publications

References 11 publications

Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units

Accelerating electrostatic surface potential calculation with multi-scale approximation on graphics processing units

Performance analysis of a novel GPU computation-to-core mapping scheme for robust facet image modeling

Runtime Adaptation for Autonomic Heterogeneous Computing

Contact Info

Product

Resources

About