Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units 2011
DOI: 10.1145/1964179.1964193
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing program flow within a many-kernel OpenCL application

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2013
2013
2016
2016

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 26 publications
(18 citation statements)
references
References 14 publications
0
18
0
Order By: Relevance
“…The benchmarks we used to evaluate our proposed VirtCL framework were collected from the AMD Accelerated Parallel Processing (APP) SDK [2] and the Rodinia 2.1 benchmark suite [8]. A real-world application (clsurf [27]) was also used to evaluate the effectiveness and scalability of the VirtCL framework. Figure 5 shows the normalized execution times and their contributing components (which were measured by instrumenting gettimeofday(), the RDTSC instruction [18], and clGetEventProfilingInfo()) for the Rodinia benchmark suite when using the native OpenCL library and the proposed VirtCL library on the aforementioned platform with only one GPU device.…”
Section: Evaluations and Discussionmentioning
confidence: 99%
“…The benchmarks we used to evaluate our proposed VirtCL framework were collected from the AMD Accelerated Parallel Processing (APP) SDK [2] and the Rodinia 2.1 benchmark suite [8]. A real-world application (clsurf [27]) was also used to evaluate the effectiveness and scalability of the VirtCL framework. Figure 5 shows the normalized execution times and their contributing components (which were measured by instrumenting gettimeofday(), the RDTSC instruction [18], and clGetEventProfilingInfo()) for the Rodinia benchmark suite when using the native OpenCL library and the proposed VirtCL library on the aforementioned platform with only one GPU device.…”
Section: Evaluations and Discussionmentioning
confidence: 99%
“…Mistry et al [11] developed a profiling technique for analyzing data flow in multi-kernel OpenCL applications. This approach does not apply optimizations but allows programmers to identify bottlenecks by manually inspecting a profiling trace.…”
Section: Related Workmentioning
confidence: 99%
“…An ipoint contains four different information pieces [16]. The first part contains the location of the point in the image.…”
Section: Related Workmentioning
confidence: 99%
“…Surf Implementations 1) CLSurf: [17] is an OpenCL implementation of the SURF algorithm, developed by the NUCAR group (Northeastern University Computer Architecture Research Group) and AMD. They explain the implementation and they make a performance analysis in [16]. We choose this code to perform the tests based on the high quality of the implemention.…”
Section: Benchmark Setupmentioning
confidence: 99%