1998
DOI: 10.1109/71.706049
|View full text |Cite
|
Sign up to set email alerts
|

A compiler optimization algorithm for shared-memory multiprocessors

Abstract: This paper presents a new compiler optimization algorithm that parallelizes applications for symmetric, shared-memory multiprocessors. The algorithm considers data locality, parallelism, and the granularity of parallelism. It uses dependence analysis and a simple cache model to drive its optimizations. It also optimizes across procedures by using interprocedural analysis and transformations. We validate the algorithm by hand-applying it to sequential versions of parallel, Fortran programs operating over dense … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
14
0

Year Published

2000
2000
2014
2014

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 24 publications
(14 citation statements)
references
References 43 publications
0
14
0
Order By: Relevance
“…Optimizing one loop in isolation was and is not sufficient for the best performance; data reorganization, loop fusion, and loop distribution add substantial benefits [7,10,11,16,21,22,25]. Our model directly computed the best permutations for locality and parallelism and we showed [7,16,20] how to directly derive good loop fusion and distribution choices, which the polyhedral model could not yet perform. Our approach also had the advantage that the resulting code was human readable and suitable for use in an interactive parallelization tool [17].…”
Section: Historical Positioningmentioning
confidence: 95%
“…Optimizing one loop in isolation was and is not sufficient for the best performance; data reorganization, loop fusion, and loop distribution add substantial benefits [7,10,11,16,21,22,25]. Our model directly computed the best permutations for locality and parallelism and we showed [7,16,20] how to directly derive good loop fusion and distribution choices, which the polyhedral model could not yet perform. Our approach also had the advantage that the resulting code was human readable and suitable for use in an interactive parallelization tool [17].…”
Section: Historical Positioningmentioning
confidence: 95%
“…Modern optimizing compilers try to achieve an efficient exploitation of the memory hierarchy by reordering the instructions of the source program [2,49]. A lot of the research in this area concentrates on computationally intensive loops [24,25,35], and loop tiling [4,44] is considered to be one of the most successful techniques.…”
Section: Related Workmentioning
confidence: 99%
“…Ferrante et al [10] determines the innermost loop with overflowing caches using the number of distinct cache lines accessed inside a loop to guide transformations like loop interchange. McKinley's cache model [19] is based on equivalence classes of array references showing temporal and spatial locality. Ghosh et al [11] introduces cache miss equations based on a system of linear Diophantine equations from a reuse vector.…”
Section: Related Workmentioning
confidence: 99%