Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-71351-7_24
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment

Abstract: Abstract. GPUs for numerical computations are becoming an attractive alternative in research. In this paper, we propose a new parallel processing environment for matrix multiplications by using both CPUs and GPUs. The execution time of matrix multiplications can be decreased to 40.1% by our method, compared with using the fastest of either CPU only case or GPU only case. Our method performs well when matrix sizes are large.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
20
0

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 37 publications
(20 citation statements)
references
References 9 publications
0
20
0
Order By: Relevance
“…Therefore, it is important to find an effective method to make full use of all the available computational resources of both the CPU and GPU. Recently, some approaches [3,4,5,6,7] have been developed to perform a specific task using both multi-core CPU and GPU simultaneously, instead of the CPU or GPU alone. In this paper, we present a way to distribute the workload into both the CPU and GPU, with a performance prediction model (i.e., a static strategy) including characteristics of feature extraction from the video stream data.…”
mentioning
confidence: 99%
“…Therefore, it is important to find an effective method to make full use of all the available computational resources of both the CPU and GPU. Recently, some approaches [3,4,5,6,7] have been developed to perform a specific task using both multi-core CPU and GPU simultaneously, instead of the CPU or GPU alone. In this paper, we present a way to distribute the workload into both the CPU and GPU, with a performance prediction model (i.e., a static strategy) including characteristics of feature extraction from the video stream data.…”
mentioning
confidence: 99%
“…An optimum split of the matrix would keep the time consumed by the GPU and CPU balanced [23,33]. The multi-device (GPU and CPU) computations are overlapped and the data transfers between GPU and CPU are performed asynchronously in order to achieve the maximum performance.…”
Section: Auto-tuning a Multi-device Matrix Multiplicationmentioning
confidence: 99%
“…Ohshima et al examined CPU and GPU parallel matrix-matrix multiplications on a single node [2], a procedure that improves the local dgemm performance. There are several frameworks and libraries to exploit the power of CPU and GPU, such as StarPU [3] and MAGMA [4].…”
Section: Related Studiesmentioning
confidence: 99%