2013
DOI: 10.1007/978-3-642-36949-0_47
|View full text |Cite
|
Sign up to set email alerts
|

On the Instrumentation of OpenMP and OmpSs Tasking Constructs

Abstract: Abstract. Parallelism has become more and more commonplace with the advent of the multicore processors. Although different parallel programming models have arisen to exploit the computing capabilities of such processors, developing applications that take benefit of these processors may not be easy. And what is worse, the performance achieved by the parallel version of the application may not be what the developer expected, as a result of a dubious utilization of the resources offered by the processor.We presen… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(4 citation statements)
references
References 12 publications
0
4
0
Order By: Relevance
“…The relatively low performance of the Xeon Phi warranted further investigation. Using OpenCL's built-in profiling capability via OpenCL events, and making use of Extrae and Paraver [30] to gather and visualize the events, we discovered that two specific kernels performed much slower than expected compared to the other platforms. These kernels were for performing the left and right halo updates, which require strided memory accesses.…”
Section: Cloverleafmentioning
confidence: 99%
“…The relatively low performance of the Xeon Phi warranted further investigation. Using OpenCL's built-in profiling capability via OpenCL events, and making use of Extrae and Paraver [30] to gather and visualize the events, we discovered that two specific kernels performed much slower than expected compared to the other platforms. These kernels were for performing the left and right halo updates, which require strided memory accesses.…”
Section: Cloverleafmentioning
confidence: 99%
“…Available plugins include an Ayudame plugin for the Temanejo graphical debugger [16], a module to compute and output the DAG of tasks, another one (experimental) to provide a trace for execution for a task system simulator, and a module to provide a trace for Paraver [17] that can potentially embed Performance Application Programming Interface (PAPI) counters information. In [18], the authors describe how the parallel trace can visualized as Gantt charts using Paraver, from a variety of perspectives (e.g.,, from a task view or a thread perspective, showing the Instruction per cycle achieved by different threads, or the TLB miss ratio).…”
Section: Related Workmentioning
confidence: 99%
“…The task creation event in their design is described using separate start and stop events. The same two events are used by Servat et al [22] for instrumenting the Nanos++ runtime system. The proposed chunk callback enables tools to understand and support forloops better.…”
Section: Related Workmentioning
confidence: 99%