1999
DOI: 10.1109/12.752663
|View full text |Cite
|
Sign up to set email alerts
|

The impact of exploiting instruction-level parallelism on shared-memory multiprocessors

Abstract: AbstractÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are less effective in removing memory stall time. Consequently, despite the inherent latency tolerance features of IL… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Year Published

2004
2004
2014
2014

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 17 publications
(18 citation statements)
references
References 15 publications
(15 reference statements)
0
18
0
Order By: Relevance
“…This facilitates INSTRUCTION LEVEL PARALLELISM with loop as combined optimization. The impact of ILP processors on the performance of shared memory multiprocessors [17] with and without latency hiding optimizing software prefetching has been represented by Pai, Ranganathan, Shafi andAdve (1999). One of the critical goals in the code optimization for multiprocessor system on single chip architecture [4] is to minimize the number of off chip memory access.…”
Section: Related Workmentioning
confidence: 99%
“…This facilitates INSTRUCTION LEVEL PARALLELISM with loop as combined optimization. The impact of ILP processors on the performance of shared memory multiprocessors [17] with and without latency hiding optimizing software prefetching has been represented by Pai, Ranganathan, Shafi andAdve (1999). One of the critical goals in the code optimization for multiprocessor system on single chip architecture [4] is to minimize the number of off chip memory access.…”
Section: Related Workmentioning
confidence: 99%
“…[20 ] represents the impact of ILP processors on the performance of shared-memory multiprocessors, both without and with the latency hiding optimization of software pre-fetching. One of the critical goals in code optimization for Multiprocessor-System-on-a-Chip (MPSoC) architectures is to minimize the number of off-chip memory accesses.…”
Section: Epic (Explicitly Parallelmentioning
confidence: 99%
“…The graph shows multiprocessor and uniprocessor experiments (MP/UP) before and after clustering (Base/Clust), normalized to the given application and system size without clustering. For analysis, execution time is categorized into data memory stall, CPU, synchronization stall, and instruction memory stall times, following the conventions of previous work (e.g., [14]). Since writes can retire before completing and read hits are fast, nearly all data memory stalls stem from reads that miss in the L2 cache.…”
Section: Performance Of Latbenchmentioning
confidence: 99%
“…Our previous work characterized the effectiveness of ILP processors in a shared-memory multiprocessor [14]. Although ILP techniques successfully and consistently reduced the CPU component of execution time, their impact on the memory (read) stall component was lower and more application-dependent, making read stall time a larger bottleneck in ILP-based multiprocessors than in previousgeneration systems.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation