2018
DOI: 10.1007/978-3-319-96983-1_51
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting Data Sparsity for Large-Scale Matrix Computations

Abstract: Abstract. Exploiting data sparsity in dense matrices is an algorithmic bridge between architectures that are increasingly memory-austere on a per-core basis and extreme-scale applications. The Hierarchical matrix Computations on Manycore Architectures (HiCMA) library tackles this challenging problem by achieving significant reductions in time to solution and memory footprint, while preserving a specified accuracy requirement of the application. HiCMA provides a high-performance implementation on distributed-me… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
29
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
3
3
2

Relationship

4
4

Authors

Journals

citations
Cited by 25 publications
(29 citation statements)
references
References 26 publications
0
29
0
Order By: Relevance
“…Moreover, these approaches cannot adapt their executions to the unpredictable noise generated by the OS or the hardware. This is why most task-based applications use RSs that are powered with dynamic scheduling strategies (Akbudak et al, 2018;Sukkari et al, 2018;Moustafa et al, 2018;Carpaye, Roman & Brenner, 2018;Agullo et al, 2016b). In this case, the scheduler focuses only on the ready tasks and decides during the execution on how to distribute them.…”
Section: Task Scheduling and Related Workmentioning
confidence: 99%
“…Moreover, these approaches cannot adapt their executions to the unpredictable noise generated by the OS or the hardware. This is why most task-based applications use RSs that are powered with dynamic scheduling strategies (Akbudak et al, 2018;Sukkari et al, 2018;Moustafa et al, 2018;Carpaye, Roman & Brenner, 2018;Agullo et al, 2016b). In this case, the scheduler focuses only on the ready tasks and decides during the execution on how to distribute them.…”
Section: Task Scheduling and Related Workmentioning
confidence: 99%
“…Performance results of TLR-based MLE computations on shared and distributed-memory systems achieve up to 13X and 5X speedups, respectively, compared to full machine precision accuracy using synthetic and real environmental datasets (up to 2M), without compromising the prediction quality. The previous works [5], [6] focus solely on the standalone linear algebra operation, i.e., the Cholesky factorization. They assess its performance using a simplified version of the Matérn kernel on synthetic datasets.…”
Section: A Contributionsmentioning
confidence: 99%
“…In this study, we propose an MLE optimization framework, which operates on Tile Low-Rank (TLR) data compression format, as implemented in the Hierarchical Computations on Manycore Architectures (HiCMA) library. More details about algorithmic complexity and memory footprint can be found in [5], [6]. Figure 1 illustrates the TLR representation of a given covariance matrix Σ(θ).…”
Section: Tile Low-rank Approximationmentioning
confidence: 99%
“…The algorithmic adaptations and the paradigm shift needed in the bulk synchronous programming model create synergism situations, which may help PARSEC promoting a localityaware task execution. Although the QDWH-based PD herein represents the targeted algorithm, some of the optimization techniques are not specific to QDWH-PD and may be used toward improving a broader class of dense linear algebra algorithms and applications on exascale systems [11], [14]- [18].…”
Section: Introductionmentioning
confidence: 99%