2007
DOI: 10.1007/978-3-540-74466-5_29
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing Chip Multiprocessor Work Distribution Using Dynamic Compilation

Abstract: Abstract. How can sequential applications benefit from the ubiquitous next generation of chip multiprocessors (CMP)? Part of the answer may be a dynamic execution environment that automatically parallelizes programs and adaptively tunes the work distribution. Experiments using the Jamaica CMP show how a runtime environment is capable of parallelizing standard benchmarks and achieving performance improvements over traditional work distributions.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2009
2009
2013
2013

Publication Types

Select...
2
2
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 11 publications
0
8
0
Order By: Relevance
“…Zhao et al [40,39] have also implemented loop parallelization in the context of JikesRVM. However, rather than GPUs, their intended target is JAMAICA [2], a multi-processor parallel architecture.…”
Section: Related Workmentioning
confidence: 99%
“…Zhao et al [40,39] have also implemented loop parallelization in the context of JikesRVM. However, rather than GPUs, their intended target is JAMAICA [2], a multi-processor parallel architecture.…”
Section: Related Workmentioning
confidence: 99%
“…The Online Tuning Framework (OTF) infrastructure, initially developed for CMP loop optimizations [19], performs automatic parallelization and enables runtime empirical search. It consists of three distinct elements: the Loop Parallelizing Compiler (LPC), the adaptive optimization component (see Section 3.1), and the runtime profiler (see Section 3.2).…”
Section: Online Tuning Frameworkmentioning
confidence: 99%
“…In the current implementation, 2-dimensional loop traversals of the iteration space are divided into tiles which are then distributed among automatically generated parallel threads. We extend the basic empirical search algorithm [19] to vary the number of loop iterations inside each tile for the clusters and levels of the memory hierarchy. These parameters directly impact the balance between costs associated with thread management, the cache efficiency, and system load.…”
Section: Adaptive Optimization Componentmentioning
confidence: 99%
See 2 more Smart Citations