2010 39th International Conference on Parallel Processing Workshops 2010
DOI: 10.1109/icppw.2010.41
|View full text |Cite
|
Sign up to set email alerts
|

Mixed-Tool Performance Analysis on Hybrid Multicore Architectures

Abstract: Abstract-This paper proposes a triangular solve algorithm with variable block size for graphics processing unit (GPU). By using diagonal blocks inversion with recursion, this algorithm works with tunable block size to achieve the best performance. Various methods are shown on how to make use of existing profiling tools to successfully measure and analyze performance of this algorithm. We use some of the most popular CPU and GPU profiling tools for their advantages and overcome their disadvantages with several … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2012
2012
2012
2012

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 15 publications
0
1
0
Order By: Relevance
“…The performance of TRSM is dominated by GEMM [26]. Since GEMM is one of the most important kernels in linear algebra, we will focus on implementing and analyzing a fast OpenCL GEMM in the coming sections.…”
Section: Profilingmentioning
confidence: 99%
“…The performance of TRSM is dominated by GEMM [26]. Since GEMM is one of the most important kernels in linear algebra, we will focus on implementing and analyzing a fast OpenCL GEMM in the coming sections.…”
Section: Profilingmentioning
confidence: 99%