Towards a holistic approach to auto-parallelization

Tournavitis, Georgios; Wang, Zheng; Franke, Björn; O’Boyle, Michael

doi:10.1145/1542476.1542496

Cited by 132 publications

(10 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Because of the complexity of control and data flow in such programs, a compiler cannot easily infer the distance between a loop iteration that generates data and the ones that consume it. For conventional synchronization approaches [6,25,26,43,47,48], this assumption of dependences between all subsequent iterations leads to sequential chains that severely limit the performance sought by running loop iterations in parallel. 3 These sequential chains, which include both communication and computation, have two sources of inefficiency.…”

Section: Opportunitymentioning

confidence: 99%

See 1 more Smart Citation

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs

Campanoni

Brownell

Kanev

et al. 2014

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.

show abstract

Section: Opportunitymentioning

confidence: 99%

“…Automatic parallelization of non-numerical programs. Several automatic methods to extract TLP have demonstrated respectable speedups on commodity multicore processors for non-numerical programs [6,16,27,29,30,43,49]. All of these methods transform loops into parallel threads.…”

Section: Related Workmentioning

confidence: 99%

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs

Campanoni

Brownell

Kanev

et al. 2014

2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)

View full text Add to dashboard Cite

show abstract

“…Another line of work [3,28,44,46] extracts parallelism by ignoring data dependences without preserving soundness via misspeculation detection and recovery. These approaches extract parallelism either by sacrificing the program's output quality [3,28,46] or by depending on user approval [44]. Instead, Perspective extracts parallelism without violating the sequential program semantics.…”

Section: Related Workmentioning

confidence: 99%

Perspective

Apostolakis

Chan

et al. 2020

Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Syste

View full text Add to dashboard Cite

The promise of automatic parallelization, freeing programmers from the error-prone and time-consuming process of making efficient use of parallel processing resources, remains unrealized. For decades, the imprecision of memory analysis limited the applicability of non-speculative automatic parallelization. The introduction of speculative automatic parallelization overcame these applicability limitations, but, even in the case of no misspeculation, these speculative techniques exhibit high communication and bookkeeping costs for validation and commit. This paper presents Perspective, a speculative-DOALL parallelization framework that maintains the applicability of speculative techniques while approaching the efficiency of non-speculative ones. Unlike current approaches which subsequently apply speculative techniques to overcome the imprecision of memory analysis, Perspective combines a novel speculation-aware memory analyzer, new efficient speculative privatization methods, and a planning phase to select a minimal-cost set of parallelizationenabling transforms. By reducing speculative parallelization overheads in ways not possible with prior parallelization systems, Perspective obtains higher overall program speedup (23.0× for 12 general-purpose C/C++ programs running on a 28-core shared-memory commodity machine) than Privateer (11.5×), the prior automatic DOALL parallelization system with the highest applicability.CCS Concepts • Software and its engineering → Compilers; Multithreading.

show abstract

“…A related technique is applied in the context of speculative parallelization of loops, where dynamic dependences across loop iterations are tracked [Rauchwerger and Padua 1995]. A few recent approaches of similar nature include Bridges et al [2007], Tian et al [2008], Zhong et al [2008], Wu et al [2008], Oancea and Mycroft [2008], and Tournavitis et al [2009]. To estimate parallel speedup of DAGs, Sarkar and Hennessy [1986] developed convex partitioning of DAGs.…”

Section: Related Workmentioning

confidence: 99%

Beyond reuse distance analysis

Fauzia

Elango

Ravishankar

et al. 2013

ACM Trans. Archit. Code Optim.

View full text Add to dashboard Cite

Emerging computer architectures will feature drastically decreased flops/byte (ratio of peak processing rate to memory bandwidth) as highlighted by recent studies on Exascale architectural trends. Further, flops are getting cheaper, while the energy cost of data movement is increasingly dominant. The understanding and characterization of data locality properties of computations is critical in order to guide efforts to enhance data locality.Reuse distance analysis of memory address traces is a valuable tool to perform data locality characterization of programs. A single reuse distance analysis can be used to estimate the number of cache misses in a fully associative LRU cache of any size, thereby providing estimates on the minimum bandwidth requirements at different levels of the memory hierarchy to avoid being bandwidth bound. However, such an analysis only holds for the particular execution order that produced the trace. It cannot estimate potential improvement in data locality through dependence-preserving transformations that change the execution schedule of the operations in the computation.In this article, we develop a novel dynamic analysis approach to characterize the inherent locality properties of a computation and thereby assess the potential for data locality enhancement via dependencepreserving transformations. The execution trace of a code is analyzed to extract a Computational-Directed Acyclic Graph (CDAG) of the data dependences. The CDAG is then partitioned into convex subsets, and the convex partitioning is used to reorder the operations in the execution trace to enhance data locality. The approach enables us to go beyond reuse distance analysis of a single specific order of execution of the operations of a computation in characterization of its data locality properties. It can serve a valuable role in identifying promising code regions for manual transformation, as well as assessing the effectiveness of compiler transformations for data locality enhancement. We demonstrate the effectiveness of the approach using a number of benchmarks, including case studies where the potential shown by the analysis is exploited to achieve lower data movement costs and better performance.

show abstract

Towards a holistic approach to auto-parallelization

Cited by 132 publications

References 42 publications

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs

Perspective

Beyond reuse distance analysis

Contact Info

Product

Resources

About