Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing 2013
DOI: 10.1145/2493123.2462916
|View full text |Cite
|
Sign up to set email alerts
|

Modeling communication in cache-coherent SMP systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
21
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 50 publications
(22 citation statements)
references
References 18 publications
1
21
0
Order By: Relevance
“…Performance Analytical Models: The work of [1], [75] optimizes for cache line awareness, where an analytical performance model is built to tune the cache line transfers of different architectures, including KNC and Sandy Bridge. Their model is recently extended to explore KNL [46], which includes constructing several performance models for certain combinations of KNL clustering and memory modes.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
See 1 more Smart Citation
“…Performance Analytical Models: The work of [1], [75] optimizes for cache line awareness, where an analytical performance model is built to tune the cache line transfers of different architectures, including KNC and Sandy Bridge. Their model is recently extended to explore KNL [46], which includes constructing several performance models for certain combinations of KNL clustering and memory modes.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…Some of this work is inherited and customized to our application code. For instance, SoA of [68], AoSoA of [22], low-level, MCDRAM-aware allocator of [39], data dependency conflicts migration of [71], Hilbert-based recursive tiling/blocking of [74], cache line aware optimization of [1], [46], [75], and partial coloring of [79]. In our work, we deal with irregular memory access patterns through optimizing for the cache line awareness based upon minimizing memory reference arithmetic and pointer chasing, as well as localizing a large bulk of computations inside a compute core.…”
Section: State-of-the-art Shared-memory Optimizationsmentioning
confidence: 99%
“…Finally, Ramos and Hoefler [13] propose a model for dissemination barrier synchronization and also compare with the Intel OpenMP barrier. However, the authors only show equivalent performance with the Intel implementation.…”
Section: Related Workmentioning
confidence: 99%
“…A few recent studies have proposed performance models for other manycore architectures [21,24]. Our approach is similar to the one used in these papers.…”
Section: Related Workmentioning
confidence: 99%
“…They all cover the same communication scenarios as the LogP model [11] (or its extensions) that is commonly used in message-passing systems. The main difference is that the underlying communication system considered in these studies are different from the one of this chapter: [21] models RMA-based communication and targets the Intel SCC processor; [24] models point-to-point communication on top of cache-coherent shared memory and targets the Intel Xeon Phi processor.…”
Section: Related Workmentioning
confidence: 99%