2007 IEEE International Parallel and Distributed Processing Symposium 2007
DOI: 10.1109/ipdps.2007.370294
|View full text |Cite
|
Sign up to set email alerts
|

Architectural Considerations for Efficient Software Execution on Parallel Microprocessors

Abstract: Chip Multiprocessors (CMPs) and Simultaneous Multithreading (SMT) processors provide high performance but put more pressure on the memory interface than their single-thread counterparts. The "memory wall" problem is exacerbated by multiple threads sharing a memory interface, and will get worse as more cores are added. Therefore, communications between cores, using shared caches or fast interconnects between private caches, are needed to keep the CPUs busy without burdening the memory interface. Multiple CMP sy… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

1
2
0

Year Published

2007
2007
2009
2009

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 27 publications
1
2
0
Order By: Relevance
“…Similar conclusions for these types of configurations have also been reported for other fine-grained algorithms [22].…”
Section: Further Study Of Memory Inefficienciessupporting
confidence: 87%
See 1 more Smart Citation
“…Similar conclusions for these types of configurations have also been reported for other fine-grained algorithms [22].…”
Section: Further Study Of Memory Inefficienciessupporting
confidence: 87%
“…Note that the other cores are not stalled and are doing useful work. The concept of having a thread change roles was described in [22] to improve cache efficiency, but we use it mainly to avoid stalls. This change alone improved the performance of our algorithm by approximately 30% at two cores.…”
Section: Methodsmentioning
confidence: 99%
“…It is shown in [18] that communicating through the cache coherence mechanism is slower than communicating through memory for some common CMPs. Using Register-Based Synchronization (RBS) will reduce the spin waiting of the threads and cut cache contention and overhead.…”
Section: Introductionmentioning
confidence: 99%