2015
DOI: 10.1007/s11390-015-1510-9
|View full text |Cite
|
Sign up to set email alerts
|

Cooperative Computing Techniques for a Deeply Fused and Heterogeneous Many-Core Processor Architecture

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
53
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 41 publications
(53 citation statements)
references
References 45 publications
0
53
0
Order By: Relevance
“…The general architecture of the SW26010 processor [10] is shown in Figure 2. The processor includes four core-groups (CGs).…”
mentioning
confidence: 99%
“…The general architecture of the SW26010 processor [10] is shown in Figure 2. The processor includes four core-groups (CGs).…”
mentioning
confidence: 99%
“…This limiting number can be increased by organizing the processors into clusters: then the first computer must speak directly only to the head of the cluster. Another way is to distribute the job near to the processing units, either inside the processor [34] or using processors to let do the job by the processing units of a GPGPU.…”
Section: A Non-technical Model Of Parallelized Sequential Operationmentioning
confidence: 99%
“…The T aihulight is an exception on both axes: it has the highest number of cores and the best parallelization efficiency. Its secret is in the processor comprising cooperating cores [34], i.e. it uses a (slightly) different computing paradigm.…”
Section: A Non-technical Model Of Parallelized Sequential Operationmentioning
confidence: 99%
See 1 more Smart Citation
“…• as a new QT receives a new Processing Unit (PU)(s), there is no need to save/restore registers and return address (less memory utilization and less instruction cycles) • the OS can receive its own PU, which is initialized in kernel mode and can promptly (i.e. without the need of context change) service the requests from the requestor core • for resource sharing, temporarily a PU can be delegated to protect the critical section; the next call to run the code fragment with the same offset will be delayed until the processing by the first PU terminates • the processor can natively accommodate to the variable need of parallelization • the actually out-of-use cores are waiting in low energy consumption mode • the hierarchic core-to-core communication greatly increases the memory throughput • the asynchronous-style computing [57] largely reduces the loss due to the gap [58] between speed of the processor and that of the memory • the direct core-to-core connection (more dynamic than in [46]) greatly enhances efficacy in large systems [59] • the thread-like feature to fork() and the hierarchic buses change the dependence of on the number of cores from linear to logarithmic [8] (enables to build really exa-scale supercomputers) The very first version of EMPA [11] has been implemented in a form of simple (practically untimed) simulator [60], now an advanced (Transaction Level Modelled) simulator is prepared in SystemC. The initial version adapted Y86 cores [61], the new one RISC-V cores.…”
Section: Some Advantages Of Empamentioning
confidence: 99%