Monchiero scite author profile

We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impediments. Recent developments in silicon nanophotonic technology have the potential to meet these off-and on-stack bandwidth requirements at acceptable power levels.Corona is a 3D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Its peak floating-point performance is 10 teraflops. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We have simulated a 1024 thread Corona system running synthetic benchmarks and scaled versions of the SPLASH-2 benchmark suite. We believe that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memoryintensive workloads, while simultaneously reducing power.

show abstract

A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

Thoziyoor

Ahn

Monchiero

et al. 2008

171

View full text Add to dashboard Cite

In this paper we introduce CACTI-D, a significant enhancement of CACTI 5.0. CACTI-D adds support for modeling of commodity DRAM technology and support for main memory DRAM chip organization. CACTI-D enables modeling of the complete memory hierarchy with consistent models all the way from SRAM based L1 caches through main memory DRAMs on DIMMs.We illustrate the potential applicability of CACTI-D in the design and analysis of future memory hierarchies by carrying out a last level cache study for a multicore multithreaded architecture at the 32nm technology node. In this study we use CACTI-D to model all components of the memory hierarchy including L1, L2, last level SRAM, logicprocess based DRAM or commodity DRAM L3 caches, and main memory DRAM chips. We carry out architectural simulation using benchmarks with large data sets and present results of their execution time, breakdown of power in the memory hierarchy, and system energy-delay product for the different system configurations. We find that commodity DRAM technology is most attractive for stacked last level caches, with significantly lower energy-delay products.

show abstract

Prototyping pipelined applications on a heterogeneous FPGA multiprocessor virtual platform

Tumeo

Branca

Camerini

et al. 2009

View full text Add to dashboard Cite

Abstract-Multiprocessors on a chip are the reality of these days. Semiconductor industry has recognized this approach as the most efficient in order to exploit chip resources, but the success of this paradigm heavily relies on the efficiency and widespread diffusion of parallel software. Among the many techniques to express the parallelism of applications, this paper focuses on pipelining, a technique well suited to data-intensive multimedia applications. We introduce a prototyping platform (FPGAbased) and a methodology for these applications. Our platform consists of a mix of standard and custom heterogeneous cores. We discuss several case studies, analyzing the interaction of the architecture and applications and we show that multimedia and telecommunication applications with unbalanced pipeline stages can be easily deployed. Our framework eases the development cycle and enables the developers to focus directly on the problems posed by the programming model in the direction of the implementation of a production system.

show abstract

A Dual-Priority Real-Time Multiprocessor System on FPGA for Automotive Applications

Tumeo

Branca

Camerini

et al. 2008

View full text Add to dashboard Cite

This paper presents the implementation of a dualpriority scheduling algorithm for real-time embedded systems on a shared memory multiprocessor on FPGA. The dual-priority microkernel is supported by a multiprocessor interrupt controller to trigger periodic and aperiodic thread activation and manage context switching. We show how the dual-priority algorithm performs on a real system prototype compared to the theoretical performance simulations with a typical standard workload of automotive applications, underlining where the differences are. 1

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Monchiero

Corona: System Implications of Emerging Nanophotonic Technology

A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

Prototyping pipelined applications on a heterogeneous FPGA multiprocessor virtual platform

A Dual-Priority Real-Time Multiprocessor System on FPGA for Automotive Applications

Contact Info

Product

Resources

About