2015
DOI: 10.1145/2742910
|View full text |Cite
|
Sign up to set email alerts
|

Can traditional programming bridge the ninja performance gap for parallel computing applications?

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 11 publications
(2 citation statements)
references
References 17 publications
0
2
0
Order By: Relevance
“…• as a new QT receives a new PU, there is no need to save/restore registers and return address (less memory utilization and less instruction cycles) • OS can receive its PU, initialized in kernel mode and can promptly (i.e., without the need of context change) service the requests from the requestor core • for resource sharing, a PU can be temporarily delegated to protect the critical section; the next call to run the code fragment with the same offset shall be delayed (by the processor) until processing by the first PU terminates • processor can natively accommodate to the variable need of parallelization • out-of-use cores are waiting in low energy consumption mode • hierarchic core-to-core communication greatly increases memory throughput • asynchronous-style computing [26] largely reduces loss stemming from the gap [27] between speeds of processor and memory • principle of locality can be applied inside the processor: direct core-to-core connection (more dynamic than in [28]) greatly enhances efficacy in large systems [29] • the communication/computation ratio, defining decisively efficiency [16], [30], [31], is reduced considerably • QTs thread-like feature akin to f ork() and hierarchic buses change the dependence of the time of creating many threads on the number of cores from linear to logarithmic (enables to build exascale supercomputers) • inter-core communication can be organized in some sense similar to Local Area Network (LAN)s of computer networking. For cooperating, cores can prefer cores in their topological proximity…”
Section: Some Advantages Of Empamentioning
confidence: 99%
“…• as a new QT receives a new PU, there is no need to save/restore registers and return address (less memory utilization and less instruction cycles) • OS can receive its PU, initialized in kernel mode and can promptly (i.e., without the need of context change) service the requests from the requestor core • for resource sharing, a PU can be temporarily delegated to protect the critical section; the next call to run the code fragment with the same offset shall be delayed (by the processor) until processing by the first PU terminates • processor can natively accommodate to the variable need of parallelization • out-of-use cores are waiting in low energy consumption mode • hierarchic core-to-core communication greatly increases memory throughput • asynchronous-style computing [26] largely reduces loss stemming from the gap [27] between speeds of processor and memory • principle of locality can be applied inside the processor: direct core-to-core connection (more dynamic than in [28]) greatly enhances efficacy in large systems [29] • the communication/computation ratio, defining decisively efficiency [16], [30], [31], is reduced considerably • QTs thread-like feature akin to f ork() and hierarchic buses change the dependence of the time of creating many threads on the number of cores from linear to logarithmic (enables to build exascale supercomputers) • inter-core communication can be organized in some sense similar to Local Area Network (LAN)s of computer networking. For cooperating, cores can prefer cores in their topological proximity…”
Section: Some Advantages Of Empamentioning
confidence: 99%
“…• as a new QT receives a new Processing Unit (PU)(s), there is no need to save/restore registers and return address (less memory utilization and less instruction cycles) • the OS can receive its own PU, which is initialized in kernel mode and can promptly (i.e. without the need of context change) service the requests from the requestor core • for resource sharing, temporarily a PU can be delegated to protect the critical section; the next call to run the code fragment with the same offset will be delayed until the processing by the first PU terminates • the processor can natively accommodate to the variable need of parallelization • the actually out-of-use cores are waiting in low energy consumption mode • the hierarchic core-to-core communication greatly increases the memory throughput • the asynchronous-style computing [57] largely reduces the loss due to the gap [58] between speed of the processor and that of the memory • the direct core-to-core connection (more dynamic than in [46]) greatly enhances efficacy in large systems [59] • the thread-like feature to fork() and the hierarchic buses change the dependence of on the number of cores from linear to logarithmic [8] (enables to build really exa-scale supercomputers) The very first version of EMPA [11] has been implemented in a form of simple (practically untimed) simulator [60], now an advanced (Transaction Level Modelled) simulator is prepared in SystemC. The initial version adapted Y86 cores [61], the new one RISC-V cores.…”
Section: Some Advantages Of Empamentioning
confidence: 99%