2015
DOI: 10.1145/2766450
|View full text |Cite
|
Sign up to set email alerts
|

Locality-Aware Work Stealing Based on Online Profiling and Auto-Tuning for Multisocket Multicore Architectures

Abstract: Modern mainstream powerful computers adopt multisocket multicore CPU architecture and NUMA-based memory architecture. While traditional work-stealing schedulers are designed for single-socket architectures, they incur severe shared cache misses and remote memory accesses in these computers. To solve the problem, we propose a locality-aware work-stealing (LAWS) scheduler, which better utilizes both the shared cache and the memory system. In LAWS, a load-balanced task allocator is used to evenly split and store … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
15
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 7 publications
(15 citation statements)
references
References 28 publications
0
15
0
Order By: Relevance
“…Targeting scheduling systems for task-based programs, a large amount of prior work aims to improve energy-efficiency [38,41], to improve data locality [9,10], or to reduce scheduling overhead [17,29]. However, with the increasing bandwidth requirements of computing tasks, many papers have also conducted related research for efficient bandwidth usage.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Targeting scheduling systems for task-based programs, a large amount of prior work aims to improve energy-efficiency [38,41], to improve data locality [9,10], or to reduce scheduling overhead [17,29]. However, with the increasing bandwidth requirements of computing tasks, many papers have also conducted related research for efficient bandwidth usage.…”
Section: Related Workmentioning
confidence: 99%
“…Many task-stealing schedulers have been proposed to improve data locality by reducing shared cache misses [10,11] and increasing local memory accesses [9,25,40,46]. Based on Charm++ [19], NUMALB [32] is proposed to balance the workload while avoiding unnecessary migrations and reducing cross-core communication.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Recently, LAWS [12] proposes a runtime library for Divide and Conquer applications in NUMA systems. It features a work stealing algorithm designed for NUMA systems, very focused in reducing remote memory accesses and last-level cache pollution.…”
Section: Related Workmentioning
confidence: 99%