2014 21st International Conference on High Performance Computing (HiPC) 2014
DOI: 10.1109/hipc.2014.7116881
|View full text |Cite
|
Sign up to set email alerts
|

Analysis and tuning of libtensor framework on multicore architectures

Abstract: Abstract-Libtensor is a framework designed to implement the tensor contractions arising form the coupled cluster and equations of motion computational quantum chemistry equations. It has been optimized for symmetry and sparsity to be memory efficient. This allows it to run efficiently on the ubiquitous and cost-effective SMP architectures. Unfortunately, movement of memory controllers on chip has endowed these SMP systems with strong NUMA properties. Moreover, the manycore trend in processor architecture deman… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
3
3
1

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(12 citation statements)
references
References 12 publications
0
12
0
Order By: Relevance
“…Our work on graph optimization builds on substantial efforts for optimization of computational graphs of tensor operations. Tensor contraction can be optimized via parallelization [22,23,41,49], efficient transposition [51], blocking [10,18,28,43], exploiting symmetry [15,48,49], and sparsity [22,24,32,39,39,47]. For complicated tensor graphs, specialized compilers like XLA [52] and TVM [8] rewrite the computational graph to optimize program execution and memory allocation on dedicated hardware.…”
Section: Previous Workmentioning
confidence: 99%
“…Our work on graph optimization builds on substantial efforts for optimization of computational graphs of tensor operations. Tensor contraction can be optimized via parallelization [22,23,41,49], efficient transposition [51], blocking [10,18,28,43], exploiting symmetry [15,48,49], and sparsity [22,24,32,39,39,47]. For complicated tensor graphs, specialized compilers like XLA [52] and TVM [8] rewrite the computational graph to optimize program execution and memory allocation on dedicated hardware.…”
Section: Previous Workmentioning
confidence: 99%
“…The tasking model incurs scheduling overheads at queueing or dequeueing tasks. Distribution of task granularities typically shows a wide variation with dominance of small tasks [20]. Achieving higher concurrency also involves using smaller blocks.…”
Section: Shared Memory Task-based Backendmentioning
confidence: 99%
“…The same approach is used in distributed memory models using the partitioned global address space (PGAS) abstraction, such as global arrays. Even in shared memory machines with non-uniform memory access (NUMA), such a cyclic distribution is essential for improving performance [20]. A more complex indexing, such as that used in CTF, allows a regular distribution of data using a mapping function between the tensor dimension and the processes based on a virtual layout.…”
Section: Tensor Data Distributionmentioning
confidence: 99%
See 1 more Smart Citation
“…Multi-dimensional tensors with symmetry are stored as a collection of fully dense "bricks" or data-tiles, where only distinct bricks are explicitly stored. The contraction of two tensors is implemented as a collection of contractions involving the set of bricks representing the tensor [7][8][9][10]. The tile sizes are chosen based on the available memory and to ensure efficient communication, maximize computation efficiency throughout the calculation, and enable dynamic load balancing.…”
Section: Introductionmentioning
confidence: 99%