2017
DOI: 10.1109/tpds.2017.2686384
|View full text |Cite
|
Sign up to set email alerts
|

Improving Execution Concurrency of Large-Scale Matrix Multiplication on Distributed Data-Parallel Platforms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
17
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(17 citation statements)
references
References 20 publications
0
17
0
Order By: Relevance
“…On the other hand, with the wide adoption of MapReduce or BSP-style data analytics in the cloud, a number of systems have implemented linear algebra libraries [10,22,26,29,37]. However, BSP programming models are ill-suited for expressing the ne-grained dependencies in linear algebra algorithms, and imposing global synchronous barriers often greatly slows down a job.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, with the wide adoption of MapReduce or BSP-style data analytics in the cloud, a number of systems have implemented linear algebra libraries [10,22,26,29,37]. However, BSP programming models are ill-suited for expressing the ne-grained dependencies in linear algebra algorithms, and imposing global synchronous barriers often greatly slows down a job.…”
Section: Related Workmentioning
confidence: 99%
“…However, BSP programming models are ill-suited for expressing the ne-grained dependencies in linear algebra algorithms, and imposing global synchronous barriers often greatly slows down a job. As a result, none of these systems [10,22,26,29] have an implementation of distributed Cholesky decomposition that can compare with NumPyWren or ScaLAPACK.…”
Section: Related Workmentioning
confidence: 99%
“…When compared to other multi-node methods, this approach performs equally well if not better. This is because in most MPI-based and distributed-computing approaches, reading data from files is mainly done by a single process [15]. Although the overall operation time could be reduced by overlapping communication and computation using clever heuristics [40], [41], it is still an extra exercise.…”
Section: G Comparison With Other Approachesmentioning
confidence: 99%
“…The approach adopted on distributed platforms, is generally based on a master-slave process model where blocks of data are broadcast by a node to other nodes in the cluster [15]. Another popular approach is to use tiling followed by batching, where tiling refers to the partitioning of the matrices into tiny blocks or tiles, while batching refers to the assignment of these tiles to threads or computing elements for computation.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation