Proceedings of the 5th Conference on Computing Frontiers 2008
DOI: 10.1145/1366230.1366256
|View full text |Cite
|
Sign up to set email alerts
|

Optimizing thread throughput for multithreaded workloads on memory constrained CMPs

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2010
2010
2014
2014

Publication Types

Select...
5

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 21 publications
0
8
0
Order By: Relevance
“…Many high-performance scientific and commercial workloads running on shared-memory systems use time-sharing of programs, gang-scheduling their respective threads (in an effort to obtain best performance from less thrashing and fewer conflicts for shared resources). This scheduling policy thus provides the baseline for previous studies [3,24,6]. In contrast, we discover that for several multithreaded programs better performance results from space-sharing rather than time-sharing the CMP.…”
Section: Introductionmentioning
confidence: 83%
See 1 more Smart Citation
“…Many high-performance scientific and commercial workloads running on shared-memory systems use time-sharing of programs, gang-scheduling their respective threads (in an effort to obtain best performance from less thrashing and fewer conflicts for shared resources). This scheduling policy thus provides the baseline for previous studies [3,24,6]. In contrast, we discover that for several multithreaded programs better performance results from space-sharing rather than time-sharing the CMP.…”
Section: Introductionmentioning
confidence: 83%
“…Isci et al [13] and Herbert et al [12] examine scaling frequency when the processor is constrained by memory bottlenecks. Bhadauria and McKee [3] find that memory constraints often render the optimal thread count to be fewer than the total number of processors on a CMP. Curtis-maury et al [8] predict efficient concurrency levels for parallel regions of multithreaded programs.…”
Section: Thread Scalingmentioning
confidence: 99%
“…Similarly, Kunal et al [20] proposed an adaptive scheduling algorithm based on the feedback of parallelism in the application. Many other works that dynamically control number of threads are aimed at studying power performance trade-offs [11]- [13], [25], [27]. Unlike the above, Barnes et al [33] presented regression techniques to predict parallel program scaling behavior (processor count).…”
Section: Related Workmentioning
confidence: 99%
“…However, off-chip memory bandwidth is considered as a fixed resource and not expected to increase with the number of core counts. Due to this many data-parallel applications becomes memory bandwidth limited and shows poor performance scaling with increasing thread counts [7], [8]. Once the off-chip bus reached its bandwidth limit, performance flattened sharply or decrease rapidly with increasing the number of threads.…”
Section: Introductionmentioning
confidence: 99%