Iterative Collective Loop Fusion

Ashby, Thomas J.; O’Boyle, Michael

doi:10.1007/11688839_17

Cited by 3 publications

(1 citation statement)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There have been various projects looking at how to combine and schedule basic KSM operations, without altering the dependency structure of the algorithms themselves, and/or the resulting performance; some examples include [5], which considers rescheduling for bandwidth reduction, and [14], which uses careful ordering of the operations of variants of the two-sided KSMs to allow scalar products to be executed at the same time as one of the matrix-vector products; this amounts to a partial pipelining approach. Our work is differs as we consider the future impact of an algorithm that does more extensive reordering.…”

Section: Reschedulingmentioning

confidence: 99%

The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

Ashby

Ghysels

Heirman

et al. 2012

Algorithms and Architectures for Parallel Processing

View full text Add to dashboard Cite

Abstract. Krylov Subspace Methods (KSMs) are popular numerical tools for solving large linear systems of equations. We consider their role in solving sparse systems on future massively parallel distributed memory machines, by estimating future performance of their constituent operations. To this end we construct a model that is simple, but which takes topology and network acceleration into account as they are important considerations. We show that, as the number of nodes of a parallel machine increases to very large numbers, the increasing latency cost of reductions may well become a problematic bottleneck for traditional formulations of these methods. Finally, we discuss how pipelined KSMs can be used to tackle the potential problem, and appropriate pipeline depths.

show abstract

Section: Reschedulingmentioning

confidence: 99%

The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

Ashby

Ghysels

Heirman

et al. 2012

Algorithms and Architectures for Parallel Processing

View full text Add to dashboard Cite

show abstract

Fusing filters with integer linear programming

Robinson

Lippmeier

Keller

2014

Proceedings of the 3rd ACM SIGPLAN Workshop on Functional High-Performance Computing

View full text Add to dashboard Cite

The key to compiling functional, collection oriented array programs into efficient code is to minimise memory traffic. Simply fusing subsequent array operations into a single computation is not sufficient; we also need to cluster separate traversals of the same array into a single traversal. Previous work demonstrated how Integer Linear Programming (ILP) can be used to cluster the operators in a general data-flow graph into subgraphs, which can be individually fused. However, these approaches can only handle operations which preserve the size of the array, thereby missing out on some optimisation opportunities. This paper addresses this shortcoming by extending the ILP approach with support for size-changing operations, using an external ILP solver to find good clusterings.

show abstract

Search-based Model-driven Loop Optimizations for Tensor Contractions

Panyala¹

View full text Add to dashboard Cite

Iterative Collective Loop Fusion

Cited by 3 publications

References 12 publications

The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

The Impact of Global Communication Latency at Extreme Scales on Krylov Methods

Fusing filters with integer linear programming

Search-based Model-driven Loop Optimizations for Tensor Contractions

Contact Info

Product

Resources

About