Exploiting Data Sparsity for Large-Scale Matrix Computations

Akbudak, Kadir; Ltaief, Hatem; Михалев, А. В.; Charara, Ali; Esposito, Aniello; Keyes, David E.

doi:10.1007/978-3-319-96983-1_51

Cited by 25 publications

(29 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, these approaches cannot adapt their executions to the unpredictable noise generated by the OS or the hardware. This is why most task-based applications use RSs that are powered with dynamic scheduling strategies (Akbudak et al, 2018;Sukkari et al, 2018;Moustafa et al, 2018;Carpaye, Roman & Brenner, 2018;Agullo et al, 2016b). In this case, the scheduler focuses only on the ready tasks and decides during the execution on how to distribute them.…”

Section: Task Scheduling and Related Workmentioning

confidence: 99%

Impact study of data locality on task-based applications through the Heteroprio scheduler

Bramas¹

2019

PeerJ Computer Science

View full text Add to dashboard Cite

The task-based approach has emerged as a viable way to effectively use modern heterogeneous computing nodes. It allows the development of parallel applications with an abstraction of the hardware by delegating task distribution and load balancing to a dynamic scheduler. In this organization, the scheduler is the most critical component that solves the DAG scheduling problem in order to select the right processing unit for the computation of each task. In this work, we extend our Heteroprio scheduler that was originally created to execute the fast multipole method on multi-GPUs nodes. We improve Heteroprio by taking into account data locality during task distribution. The main principle is to use different task-lists for the different memory nodes and to investigate how locality affinity between the tasks and the different memory nodes can be evaluated without looking at the tasks’ dependencies. We evaluate the benefit of our method on two linear algebra applications and a stencil code. We show that simple heuristics can provide significant performance improvement and cut by more than half the total memory transfer of an execution.

show abstract

Section: Task Scheduling and Related Workmentioning

confidence: 99%

Impact study of data locality on task-based applications through the Heteroprio scheduler

Bramas¹

2019

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…Performance results of TLR-based MLE computations on shared and distributed-memory systems achieve up to 13X and 5X speedups, respectively, compared to full machine precision accuracy using synthetic and real environmental datasets (up to 2M), without compromising the prediction quality. The previous works [5], [6] focus solely on the standalone linear algebra operation, i.e., the Cholesky factorization. They assess its performance using a simplified version of the Matérn kernel on synthetic datasets.…”

Section: A Contributionsmentioning

confidence: 99%

“…In this study, we propose an MLE optimization framework, which operates on Tile Low-Rank (TLR) data compression format, as implemented in the Hierarchical Computations on Manycore Architectures (HiCMA) library. More details about algorithmic complexity and memory footprint can be found in [5], [6]. Figure 1 illustrates the TLR representation of a given covariance matrix Σ(θ).…”

Section: Tile Low-rank Approximationmentioning

confidence: 99%

Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations

Abdulah¹,

Ltaief²,

Sun³

et al. 2018

2018 IEEE International Conference on Cluster Computing (CLUSTER)

Self Cite

View full text Add to dashboard Cite

Maximum likelihood estimation is an important statistical technique for estimating missing data, for example in climate and environmental applications, which are usually large and feature data points that are irregularly spaced. In particular, the Gaussian log-likelihood function is the de facto model, which operates on the resulting sizable dense covariance matrix. The advent of high performance systems with advanced computing power and memory capacity have enabled full simulations only for rather small dimensional climate problems, solved at the machine precision accuracy. The challenge for high dimensional problems lies in the computation requirements of the log-likelihood function, which necessitates O(n 2 ) storage and O(n 3 ) operations, where n represents the number of given spatial locations. This prohibitive computational cost may be reduced by using approximation techniques that not only enable large-scale simulations otherwise intractable, but also maintain the accuracy and the fidelity of the spatial statistics model. In this paper, we extend the Exascale GeoStatistics software framework (i.e., ExaGeoStat 1 ) to support the Tile Low-Rank (TLR) approximation technique, which exploits the data sparsity of the dense covariance matrix by compressing the off-diagonal tiles up to a user-defined accuracy threshold. The underlying linear algebra operations may then be carried out on this data compression format, which may ultimately reduce the arithmetic complexity of the maximum likelihood estimation and the corresponding memory footprint. Performance results of TLR-based computations on shared and distributed-memory systems attain up to 13X and 5X speedups, respectively, compared to full accuracy simulations using synthetic and real datasets (up to 2M), while ensuring adequate prediction accuracy.

show abstract

“…The algorithmic adaptations and the paradigm shift needed in the bulk synchronous programming model create synergism situations, which may help PARSEC promoting a localityaware task execution. Although the QDWH-based PD herein represents the targeted algorithm, some of the optimization techniques are not specific to QDWH-PD and may be used toward improving a broader class of dense linear algebra algorithms and applications on exascale systems [11], [14]- [18].…”

Section: Introductionmentioning

confidence: 99%

Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems

Sukkari

Ltaief

Keyes

et al. 2019

2019 IEEE International Conference on Cluster Computing (CLUSTER)

Self Cite

View full text Add to dashboard Cite

This paper describes how to leverage a task-based implementation of the polar decomposition on massively parallel systems using the PARSEC dynamic runtime system. Based on a formulation of the iterative QR Dynamically-Weighted Halley (QDWH) algorithm, our novel implementation reduces data traffic while exploiting high concurrency from the underlying hardware architecture. First, we replace the most time-consuming classical QR factorization phase with a new hierarchical variant, customized for the specific structure of the matrix during the QDWH iterations. The newly developed hierarchical QR for QDWH exploits not only the matrix structure, but also shortens the length of the critical path to maximize hardware occupancy. We then deploy PARSEC to seamlessly orchestrate, pipeline, and track the data dependencies of the various linear algebra building blocks involved during the iterative QDWH algorithm. PARSEC enables to overlap communications with computations thanks to its asynchronous scheduling of fine-grained computational tasks. It employs look-ahead techniques to further expose parallelism, while actively pursuing the critical path. In addition, we identify synergistic opportunities between the task-based QDWH algorithm and the PARSEC framework. We exploit them during the hierarchical QR factorization to enforce a localityaware task execution. The latter feature permits to minimize the expensive inter-node communication, which represents one of the main bottlenecks for scaling up applications on challenging distributed-memory systems. We report numerical accuracy and performance results using well and ill-conditioned matrices. The benchmarking campaign reveals up to 2X performance speedup against the existing state-of-the-art implementation for the polar decomposition on 36,864 cores.

show abstract

Exploiting Data Sparsity for Large-Scale Matrix Computations

Cited by 25 publications

References 26 publications

Impact study of data locality on task-based applications through the Heteroprio scheduler

Impact study of data locality on task-based applications through the Heteroprio scheduler

Parallel Approximation of the Maximum Likelihood Estimation for the Prediction of Large-Scale Geostatistics Simulations

Leveraging Task-Based Polar Decomposition Using PARSEC on Massively Parallel Systems

Contact Info

Product

Resources

About