2008
DOI: 10.1007/978-3-540-89740-8_2
|View full text |Cite
|
Sign up to set email alerts
|

MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
107
0

Year Published

2009
2009
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 159 publications
(107 citation statements)
references
References 11 publications
0
107
0
Order By: Relevance
“…Another project, MCUDA [16], applied code transformations to CUDA kernels, enabling them to run efficiently on multicore CPUs. Unforunately for legacy code maintainers, the reverse operation -porting multicore code to GPUs -proved difficult [14].…”
Section: Related Workmentioning
confidence: 99%
“…Another project, MCUDA [16], applied code transformations to CUDA kernels, enabling them to run efficiently on multicore CPUs. Unforunately for legacy code maintainers, the reverse operation -porting multicore code to GPUs -proved difficult [14].…”
Section: Related Workmentioning
confidence: 99%
“…Ravi et al [29] rely on the molding technique (changing the dimensions of grid and thread blocks while preserving the correctness of the computation), when possible. Pai et al [27] propose a similar technique and associated code transformation based on iterative wrapping [35] that produces an elastic kernel. These techniques rely on developer or compiler transformation to prepare the programs for concurrent execution.…”
Section: Related Workmentioning
confidence: 99%
“…break, continue and return). A loop-fission technique proposed in [10] is used to break the kernel-wide thread-loop into localized thread-loops which do not cross any of the synchronization directives encountered in the code. Fig.…”
Section: Fcuda Front-end Transformationmentioning
confidence: 99%
“…This way serialized execution of threads maintains the thread-block synchronization semantics. FCUDA extends the MCUDA [10] implementation of loop-fission by adding COMPUTE and TRANSFER pragmas to the list of synchronization directives. COMPUTE and TRANSFER pragmas are used by the FPGA programmer to annotate computation and off-chip data communication tasks.…”
Section: Fcuda Front-end Transformationmentioning
confidence: 99%