2018
DOI: 10.1002/cpe.5003
|View full text |Cite
|
Sign up to set email alerts
|

Extending smart containers for data locality‐aware skeleton programming

Abstract: Summary We present an extension for the SkePU skeleton programming framework to improve the performance of sequences of transformations on smart containers. By using lazy evaluation, SkePU records skeleton invocations and dependencies as directed by smart container operands. When a partial result is required by a different part of the program, the run‐time system will process the entire lineage of skeleton invocations; tiling is applied to keep chunks of container data in the working set for the whole sequence… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
5
1

Relationship

4
2

Authors

Journals

citations
Cited by 11 publications
(15 citation statements)
references
References 22 publications
0
15
0
Order By: Relevance
“…Operands to skeleton instances are to be passed in data containers, which are STLlike, generic collection abstract data types like Vector and Matrix that encapsulate C++ array-type data. We call them smart containers [9] because they transparently perform data transfer and memory management for their elements in (heterogeneous) systems with distributed memory, as well as global optimizations for data locality [14]. Using C++ iterators, skeleton instance calls may also operate on a proper subset of a container's elements only.…”
Section: Skepu 3 Overviewmentioning
confidence: 99%
“…Operands to skeleton instances are to be passed in data containers, which are STLlike, generic collection abstract data types like Vector and Matrix that encapsulate C++ array-type data. We call them smart containers [9] because they transparently perform data transfer and memory management for their elements in (heterogeneous) systems with distributed memory, as well as global optimizations for data locality [14]. Using C++ iterators, skeleton instance calls may also operate on a proper subset of a container's elements only.…”
Section: Skepu 3 Overviewmentioning
confidence: 99%
“…We can only speculate that this anomaly might be caused by some stateful optimization within the CUDA memory allocator, and it might also be specific to our GPU, CUDA and driver version. 2 Straightline control holds for use with lazy execution [3] and for branch-free regions in a kernel-level compiler IR. i = 0, ..., N − 1 is executed either on the CPU (device d i = 0) or on the accelerator (d i = 1).…”
Section: Problem Formulationmentioning
confidence: 99%
“…The global optimization method presented in this paper could, in principle, be likewise applied as a runtime optimization once sufficiently large kernel (sub)graphs such as lineages [3] have been identified at runtime, which in turn is done by lazy execution techniques that are also applied, e.g., in Spark and TensorFlow. However, in our case the runtime overhead for the optimization might only pay off if the computed memory placement can be reused, e.g.…”
Section: Related Work 61 Transfer Fusionmentioning
confidence: 99%
See 1 more Smart Citation
“…Ernstsson and Kessler propose a solution based on skeletons to the problem of manage data locality on large clusters. This solution is based on the use of lazy evaluation to record invocations and dependences of sequences of transformations, using tiling to keep chunks of container data in the same working set, thus improving cache usage.…”
Section: In This Issuementioning
confidence: 99%