2014
DOI: 10.15803/ijnc.4.1_189
|View full text |Cite
|
Sign up to set email alerts
|

NUMA Computing with Hardware and Software Co-Support on Configurable Emulated Shared Memory Architectures

Abstract: The emulated shared memory (ESM) architectures are good candidates for future general purpose parallel computers due to their ability to provide an easy-to-use explicitly parallel synchronous model of computation to programmers as well as avoid most performance bottlenecks present in current multicore architectures. In order to achieve full performance the applications must, however, have enough thread-level parallelism (TLP). To solve this problem, in our earlier work we have introduced a class of configurabl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2014
2014
2019
2019

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(15 citation statements)
references
References 19 publications
0
15
0
Order By: Relevance
“…Compared to commercial CPUs and GPUs, PRAM mode is very fast for algorithms with irregular memory accesses and control flow [5], but in some cases it is slower when it comes to regular memory accesses and control flow. We have also in earlier work [4] introduced REPLICAs NUMA mode and given some evaluation on microbenchmarks; we also showed that latency and locality optimizations can give a performance gain in NUMA mode. We here consider a computation that has both a PRAM and a NUMA implementation as a component with two variants.…”
Section: Introductionmentioning
confidence: 72%
See 4 more Smart Citations
“…Compared to commercial CPUs and GPUs, PRAM mode is very fast for algorithms with irregular memory accesses and control flow [5], but in some cases it is slower when it comes to regular memory accesses and control flow. We have also in earlier work [4] introduced REPLICAs NUMA mode and given some evaluation on microbenchmarks; we also showed that latency and locality optimizations can give a performance gain in NUMA mode. We here consider a computation that has both a PRAM and a NUMA implementation as a component with two variants.…”
Section: Introductionmentioning
confidence: 72%
“…The REPLICA architecture is a family of chip multiprocessors (CMP); it is a configurable emulated shared memory (CESM) machine where different configurations have different number of cores, ALUs and memory units (MUs) [4], however in this paper we focus on the configuration given in Table I. The REPLICA architecture is a Very Long Instruction Word (VLIW) architecture, though compared to standard VLIW architectures REPLICA has support for chained functional units (FUs) [7].…”
Section: Replica Architecturementioning
confidence: 99%
See 3 more Smart Citations