1994
DOI: 10.1006/jpdc.1994.1056
|View full text |Cite
|
Sign up to set email alerts
|

Optimal Strategies for Spinning and Blocking

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
21
0

Year Published

2009
2009
2018
2018

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(21 citation statements)
references
References 10 publications
(2 reference statements)
0
21
0
Order By: Relevance
“…Every store on a shared or owned cache line incurs a broadcast invalidation to all nodes. This happens because the cache directory is incomplete (it does not keep track of the sharers) and does not in any way detect whether sharing is limited within the node 7 . Therefore, even if all sharers reside on the same node, a store needs to pay the overhead of a broadcast, thus increasing the cost from around 83 to 244 cycles.…”
Section: Remote Accessesmentioning
confidence: 99%
“…Every store on a shared or owned cache line incurs a broadcast invalidation to all nodes. This happens because the cache directory is incomplete (it does not keep track of the sharers) and does not in any way detect whether sharing is limited within the node 7 . Therefore, even if all sharers reside on the same node, a store needs to pay the overhead of a broadcast, thus increasing the cost from around 83 to 244 cycles.…”
Section: Remote Accessesmentioning
confidence: 99%
“…The mutual exclusion methodology therefore, needs to be properly chosen. Though blocking ensures robustness, there is a cost to it and results in very bad outliers [24]. In a multi-core platform, where a CPU can be dedicated to a process to avoid context switching overheads, non-blocking methods such as atomic spinning for a flag is recommended as there is no idle time as soon as the shared resource becomes available [25].…”
Section: Maximizing Parallelizationmentioning
confidence: 99%
“…If too few threads sleep too little the system will remain overloaded; if delays are too long or too common processor utilization will plummet. Prior work has suggested heuristics for tuning the amount of spinning in the system as the thread and processor counts vary [3].…”
Section: Hybrid Spinning/blocking Approachesmentioning
confidence: 99%
“…In between the two extremes, a truly adaptive spinning approach such as the one presented by Boguslavky et al [3] would optimize the trade-off between spinning and blocking to minimize the performance loss, while also applying partial pipelining so context switches occur at least partly off the critical path. In the best case performance drops off gradually from peak until pipelining is effective; in the worst case performance drops off rapidly at first (as happens currently with the spinlocks), then recovers somewhat as pipelining becomes effective.…”
Section: Ideal Spin-then-block Mutexmentioning
confidence: 99%
See 1 more Smart Citation