Diogo Sampaio scite author profile

Profiling feedback is an important technique used by developers for performance debugging, where it is usually used to pinpoint performance bottlenecks and also to find optimization opportunities. Assessing the validity and potential benefit of a program transformation requires accurate knowledge of the data flow and dependencies, which can be uncovered by profiling a particular execution of the program.In this work we develop poly-prof, an end-to-end infrastructure for dynamic binary analysis, which produces feedback about the potential to apply complex program rescheduling. Our tool can handle both inter-and intraprocedural aspects of the program in a unified way, thus providing interprocedural transformation feedback.

show abstract

Divergence analysis

Sampaio

Souza

Collange

et al. 2013

ACM Trans. Program. Lang. Syst.

View full text Add to dashboard Cite

Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.

show abstract

Horse and dog blood flows in PDMS rectangular microchannels: Experimental characterization of the plasma layer under different flow conditions

Sampaio

Lopes

Semião

2015

Experimental Thermal and Fluid Science

View full text Add to dashboard Cite

Profiling divergences in GPU applications

Coutinho

Sampaio

Pereira

et al. 2012

Concurrency and Computation

View full text Add to dashboard Cite

SUMMARYThe increasing programmability and the high computational power of graphics processing units make them attractive to general purpose programming. However, taking full benefit of this execution environment is a challenging task. One of these challenges stems from divergences, a phenomenon that occurs when threads that execute in lock‐step are forced to take different program paths because of branches in the code. In face of divergences, some threads will have to wait, idly, while their diverging siblings execute. Optimizing the code to avoid divergences is difficult because this task demands a deep understanding of programs that might be large and convoluted. To facilitate the detection of divergences, this paper introduces the divergence map, a data structure that indicates the location and the volume of divergences in a program. We build this map via dynamic profiling techniques, which we have implemented on top of an open source Parallel Thread Execution compiler. To illustrate the importance of the divergence map, we have used it to pinpoint the core regions that must be optimized in well‐known public applications. By hand optimizing some applications, we have added 9–11% speedups onto kernels that have already gone through the sieve of many programmers. Copyright © 2012 John Wiley & Sons, Ltd.

show abstract

Divergence Analysis with Affine Constraints

Sampaio

Martins²,

Collange³

et al. 2012

View full text Add to dashboard Cite

International audienceThe rising popularity of graphics processing units is bringing renewed interest in code optimization techniques for SIMD processors. Many of these optimizations rely on divergence analyses, which classify variables as uniform, if they have the same value on every thread, or divergent, if they might not. This paper introduces a new kind of divergence analysis, that is able to represent variables as affine functions of thread identifiers. We have implemented this analysis in Ocelot, an open source compiler, and use it to analyze a suite of 177 CUDA kernels from well-known benchmarks. We can mark about one fourth of all program variables as affine functions of thread identifiers. In addition to the novel divergence analysis, we also introduce the notion of a divergence aware register allocator. This allocator uses information from our analysis to either rematerialize affine variables, or to move uniform variables to shared memory. As a testimony of its effectiveness, our divergence aware allocator produces GPU code that is 29.70% faster than the code produced by Ocelot's register allocator. Divergence analysis with affine constraints is publicly available in the Ocelot compiler since June/2012

show abstract

Performance Debugging of GPGPU Applications with the Divergence Map

Coutinho

Sampaio

Pereira

et al. 2010

View full text Add to dashboard Cite

The increasing programability and the high computational power of Graphical Processing Units (GPU) make them attractive to general purpose programming. However, taking full benefit of this execution environment is a challenging task. One of these challenges stem from divergences, a phenomenon that occurs when threads that execute in lock-step are forced to take different program paths due to branches in the code. In face of divergences, some threads will have to wait, idly, while their diverging siblings execute. Optimizing the code to avoid divergences is difficult, because this task demands a deep understanding of programs that might be large and convoluted. In order to facilitate the detection of divergences, this paper introduces the divergence map, a data structure that indicates the location and the volume of divergences in a program. We build this map via dynamic profiling techniques, which we have implemented on top of an open source CUDA compiler. To illustrate the importance of the divergence map, we have used it to pin-point the core regions that must be optimized in well known public applications. By hand optimizing some applications, we have added 9-11% speedups onto kernels that have already gone through the sieve of many programmers.

show abstract

Spill Code Placement for SIMD Machines

Sampaio¹,

Gedeon²,

Pereira³

et al. 2012

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Diogo Sampaio

Divergence Analysis and Optimizations

Data-flow/dependence profiling for structured transformations

Divergence analysis

Horse and dog blood flows in PDMS rectangular microchannels: Experimental characterization of the plasma layer under different flow conditions

Profiling divergences in GPU applications

Divergence Analysis with Affine Constraints

Performance Debugging of GPGPU Applications with the Divergence Map

Spill Code Placement for SIMD Machines

Contact Info

Product

Resources

About