Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems 2013
DOI: 10.1145/2451116.2451157
|View full text |Cite
|
Sign up to set email alerts
|

Traffic management

Abstract: NUMA systems are characterized by Non-Uniform Memory Access times, where accessing data in a remote node takes longer than a local access. NUMA hardware has been built since the late 80's, and the operating systems designed for it were optimized for access locality. They co-located memory pages with the threads that accessed them, so as to avoid the cost of remote accesses. Contrary to older systems, modern NUMA hardware has much smaller remote wire delays, and so remote access costs per se are not the main co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
13
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 199 publications
(16 citation statements)
references
References 22 publications
0
13
0
Order By: Relevance
“…A memory placement method, called Carrefour, has been proposed in [10]. It improves performance on modern NUMA systems by reconciling the data locality and the memory congestion problems.…”
Section: Related Workmentioning
confidence: 99%
“…A memory placement method, called Carrefour, has been proposed in [10]. It improves performance on modern NUMA systems by reconciling the data locality and the memory congestion problems.…”
Section: Related Workmentioning
confidence: 99%
“…Hybrid memory placement policies attempt to fully utilize total system bandwidth by distributing pages between system memory and stacked memory based on the bandwidth ratio (Agarwal et al 2015;Chou et al 2015a). NUMA aware placement on the other hand focuses on data placement near computing resources to minimize overall latency (Dashti et al 2013;Verghese et al 1996;Bolosky et al 1989). Our work is orthogonal these proposals.…”
Section: Related Workmentioning
confidence: 99%
“…Recently, migrating memory pages to improve memory locality during the execution of a parallel application has received renewed attention. Several such mechanisms have been proposed, operating on the hardware level [7,8,41], compiler-level [33,38], or OSlevel [9,11,17]. These mechanisms do not require changes to the application to improve locality, but can cause a significant runtime overhead that limits their gains compared to the manual changes applied in this paper.…”
Section: Related Workmentioning
confidence: 99%
“…As memory is shared between all threads on the same node in an OpenMP environment, care must be taken to place data close to the threads that use it. This can result in significantly faster data accesses in shared memory architectures [3,7,9,11,16,33]. On the other hand, data used by each MPI rank is generally private to the rank [14], such that locality issues have a much lower impact on a single cluster node in general.…”
Section: Introductionmentioning
confidence: 99%