Can traditional programming bridge the ninja performance gap for parallel computing applications?

Satish, Nadathur; Kim, Changkyu; Chhugani, Jatin; Saito, Hideki; Krishnaiyer, Rakesh; Smelyanskiy, Mikhail; Girkar, Milind; Dubey, Pradeep

doi:10.1145/2742910

Cited by 11 publications

(2 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…• as a new QT receives a new PU, there is no need to save/restore registers and return address (less memory utilization and less instruction cycles) • OS can receive its PU, initialized in kernel mode and can promptly (i.e., without the need of context change) service the requests from the requestor core • for resource sharing, a PU can be temporarily delegated to protect the critical section; the next call to run the code fragment with the same offset shall be delayed (by the processor) until processing by the first PU terminates • processor can natively accommodate to the variable need of parallelization • out-of-use cores are waiting in low energy consumption mode • hierarchic core-to-core communication greatly increases memory throughput • asynchronous-style computing [26] largely reduces loss stemming from the gap [27] between speeds of processor and memory • principle of locality can be applied inside the processor: direct core-to-core connection (more dynamic than in [28]) greatly enhances efficacy in large systems [29] • the communication/computation ratio, defining decisively efficiency [16], [30], [31], is reduced considerably • QTs thread-like feature akin to f ork() and hierarchic buses change the dependence of the time of creating many threads on the number of cores from linear to logarithmic (enables to build exascale supercomputers) • inter-core communication can be organized in some sense similar to Local Area Network (LAN)s of computer networking. For cooperating, cores can prefer cores in their topological proximity…”

Section: Some Advantages Of Empamentioning

confidence: 99%

How to extend the Single-Processor Paradigm to the Explicitly Many-Processor Approach

Végh

2020

Preprint

View full text Add to dashboard Cite

The computing paradigm invented for processing a small amount of data on a single segregated processor cannot meet the challenges set by the present-day computing demands. The paper proposes a new computing paradigm (extending the old one to use several processors explicitly) and discusses some questions of its possible implementation. Some advantages of the implemented approach, illustrated with the results of a looselytimed simulator, are presented.

show abstract

Section: Some Advantages Of Empamentioning

confidence: 99%

How to extend the Single-Processor Paradigm to the Explicitly Many-Processor Approach

Végh

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…• as a new QT receives a new Processing Unit (PU)(s), there is no need to save/restore registers and return address (less memory utilization and less instruction cycles) • the OS can receive its own PU, which is initialized in kernel mode and can promptly (i.e. without the need of context change) service the requests from the requestor core • for resource sharing, temporarily a PU can be delegated to protect the critical section; the next call to run the code fragment with the same offset will be delayed until the processing by the first PU terminates • the processor can natively accommodate to the variable need of parallelization • the actually out-of-use cores are waiting in low energy consumption mode • the hierarchic core-to-core communication greatly increases the memory throughput • the asynchronous-style computing [57] largely reduces the loss due to the gap [58] between speed of the processor and that of the memory • the direct core-to-core connection (more dynamic than in [46]) greatly enhances efficacy in large systems [59] • the thread-like feature to fork() and the hierarchic buses change the dependence of on the number of cores from linear to logarithmic [8] (enables to build really exa-scale supercomputers) The very first version of EMPA [11] has been implemented in a form of simple (practically untimed) simulator [60], now an advanced (Transaction Level Modelled) simulator is prepared in SystemC. The initial version adapted Y86 cores [61], the new one RISC-V cores.…”

Section: Some Advantages Of Empamentioning

confidence: 99%

The Need for Modern Computing Paradigm: Science Applied to Computing

Végh¹,

Tisan

2019

2019 International Conference on Computational Science and Computational Intelligence (CSCI)

View full text Add to dashboard Cite

More than hundred years ago the 'classic physics' was it in its full power, with just a few unexplained phenomena; which however led to a revolution and the development of the 'modern physics'. Today the computing is in a similar position: computing is a sound success story, with exponentially growing utilization, but with a growing number of difficulties and unexpected issues as moving towards extreme utilization conditions. In physics studying the nature under extreme conditions has lead to the understanding of the relativistic and quantal behavior. Quite similarly in computing some phenomena, acquired in connection with extreme (computing) conditions, cannot be understood based on of the 'classic computing paradigm'. The paper draws the attention that under extreme conditions qualitatively different behaviors may be encountered in both classic and world, and pinpointing that certain, formerly unnoticed or neglected aspects enable to explain new phenomena as well as to enhance computing features. Moreover,an idea of modern computing paradigm implementation is proposed.

show abstract

How to Extend Single-Processor Approach to Explicitly Many-Processor Approach

Végh¹

2021

Transactions on Computational Science and Computational Intelligence

View full text Add to dashboard Cite

Can traditional programming bridge the ninja performance gap for parallel computing applications?

Cited by 11 publications

References 17 publications

How to extend the Single-Processor Paradigm to the Explicitly Many-Processor Approach

How to extend the Single-Processor Paradigm to the Explicitly Many-Processor Approach

The Need for Modern Computing Paradigm: Science Applied to Computing

How to Extend Single-Processor Approach to Explicitly Many-Processor Approach

Contact Info

Product

Resources

About