2014 IEEE 13th International Symposium on Parallel and Distributed Computing 2014
DOI: 10.1109/ispdc.2014.11
|View full text |Cite
|
Sign up to set email alerts
|

A Parallel Task-Based Approach to Linear Algebra

Abstract: Abstract-Processors with large numbers of cores are becoming commonplace. In order to take advantage of the available resources in these systems, the programming paradigm has to move towards increased parallelism. However, increasing the level of concurrency in the program does not necessarily lead to better performance. Parallel programming models have to provide flexible ways of defining parallel tasks and at the same time, efficiently managing the created tasks. OpenMP is a widely accepted programming model… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2014
2014
2015
2015

Publication Types

Select...
2
1
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 13 publications
0
5
0
Order By: Relevance
“…Tousimojarad and Vanderbauwhede [33] cleverly reduce access latencies to uniformly distributed data by using copies whose home cache is local to the access thread on the TILEPro64 processor. Zhou and Demsky [2] build a NUMAaware adaptive garbage collector that migrate objects to improve locality on manycore processors.…”
Section: Related Workmentioning
confidence: 99%
“…Tousimojarad and Vanderbauwhede [33] cleverly reduce access latencies to uniformly distributed data by using copies whose home cache is local to the access thread on the TILEPro64 processor. Zhou and Demsky [2] build a NUMAaware adaptive garbage collector that migrate objects to improve locality on manycore processors.…”
Section: Related Workmentioning
confidence: 99%
“…Deriving the parallel kernel from the generated single-threaded code is mostly a matter of replacing the loops by the OpenCL indexing calls (get global id, get local id etc), and in the case where the original code has multiple loops, as is the case 2 [omitted for blind review] 3…”
Section: Opencl Implementation Detailsmentioning
confidence: 99%
“…Since all the tasks defined in the GPC code will be executed in parallel, a seq pragma is required to run the two phases sequentially. Each phase uses a partial continuous for, par cont for [3], in order to parallelise the outer loop over rows, and a #pragma simd 4 to help the compiler vectorise the inner loop over columns. par cont for is a sequential for loop that works as follows:…”
Section: Gprm Implementation Detailsmentioning
confidence: 99%
See 2 more Smart Citations