In recent years, there has been what can only be described as an explosion in the types of processing devices one can expect to find within a given computer system. These include the multi-core CPU, the General Purpose Graphics Processing Unit (GPGPU) and the Accelerated Processing Unit (APU), to name but a few. The widespread uptake of these systems presents would-be users with at least two problems. Firstly, each device exposes a complex underlying architecture which must be appreciated in order to attain optimal performance. This is coupled with the fact that a single system can support an arbitrary number of such devices. Consequently, fully leveraging the performance capabilities of such a system must come at a cost -increasingly prolonged development times. Adhering to a methodology will have the significant industrial impact of reducing these development times. This paper describes the continued formulation of such a novel methodology. Two real world scientific programs are optimized for execution on the CUDA platform. Double precision accuracy and optimized speedups (which include PCI-E transfer times) of 15x and 17x are achieved.
The heterogeneous computing revolution continues unabated. Yet despite the vast number of naïve users in possession of bespoke software hoping to embrace the opportunities that this revolution has wrought, few approaches proposed in current literature can guide such users in these efforts. The most appropriate choice would appear to be a (semi-)automating compiler. However, these typically target a single device-type and demand the unguided use of directives. Consequently, they are of little use when naïve users are seeking answers to more fundamental questions, such as: which fragments of a program can/should be parallelized, which device should each fragment target, and what speedup will be attained. To this end, this paper expands on previous work and proposes Paralysis -an extensible guidance environment, tiered for varying programmer competencies with support for static and dynamic analysis techniques. At the highest level, guided user experiences are paramount. At the lowest level, underlying functionality is exposed as a set of plug-ins, ensuring longevity. A partial prototype, built atop the Cetus infrastructure, is described. It is used to analyze two serial programs for CUDA execution -the DFT and the Box Blur Filter. Speedups of 15x and 22x are achieved on the basis of the analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.