Resolution enhancement method for lensless in-line holographic microscope with spatially-extended light source

Endpoint devices for Internet-of-Things not only need to work under extremely tight power envelope of a few milliwatts, but also need to be flexible in their computing capabilities, from a few kOPS to GOPS. Near-threshold (NT) operation can achieve higher energy efficiency, and the performance scalability can be gained through parallelism. In this paper we describe the design of an opensource RISC-V processor core specifically designed for NT operation in tightly coupled multi-core clusters. We introduce instructionextensions and microarchitectural optimizations to increase the computational density and to minimize the pressure towards the shared memory hierarchy. For typical data-intensive sensor processing workloads the proposed core is on average 3.5× faster and 3.2× more energy-efficient, thanks to a smart L0 buffer to reduce cache access contentions and support for compressed instructions. SIMD extensions, such as dot-products, and a built-in L0 storage further reduce the shared memory accesses by 8× reducing contentions by 3.2×. With four NT-optimized cores, the cluster is operational from 0.6 V to 1.2 V achieving a peak efficiency of 67 MOPS/mW in a low-cost 65 nm bulk CMOS technology. In a low power 28 nm FDSOI process a peak efficiency of 193 MOPS/mW (40 MHz, 1 mW) can be achieved.Index Terms-Internet-of-Things, Ultra-low-power, Multi-core, RISC-V, ISA-extensions.

show abstract

GAP-8: A RISC-V SoC for AI at the Edge of the IoT

Flamand¹,

et al. 2018

View full text Add to dashboard Cite

Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

Pullini

Rossi

Loi

et al. 2019

IEEE J. Solid-State Circuits

105

View full text Add to dashboard Cite

This paper presents Mr.Wolf, a Parallel Ultra Low Power (PULP) SoC featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class RISC-V core augmented with an autonomous IO subsystem for efficient data transfer from a wide set of peripherals. The small core can offload compute-intensive kernels to an 8-cores floating-point capable processing engine available on demand. The proposed SoC, implemented in a 40 nm LP CMOS technology, features a 108 µW fully retentive memory (512 kB). The IO subsystem is capable of transferring up to 1.6 Gbit/s from external devices to the memory in less than 2.5 mW. The 8-core compute cluster achieves a peak performance of 850 millions of 32-bit integer multiply and accumulate per second (MMAC/s), 500 millions of 32-bit floating-point multiply and accumulate per second (MFMAC/s)-1 GFlop/s-with an energy efficiency up to 15 MMAC/s/mW and 9 MFMAC/s/mW. These building blocks are supported by aggressive on-chip power conversion and management, enabling energy-proportional heterogeneous computing for always-on IoT end-nodes improving performance by several orders of magnitude with respect to traditional single core MCUs within a power envelope of 153 mW. We demonstrated the capabilities of the proposed SoC on a wide set of near-sensor processing kernels showing that Mr.Wolf can deliver performance up to 16.4 GOp/s with energy efficiency up to 274 MOp/s/mW on reallife applications, paving the way for always-on data analytics on high-bandwidth sensors at the edge of the Internet of Things.

show abstract

A fully-synthesizable single-cycle interconnection network for Shared-L1 processor clusters

et al. 2011

View full text Add to dashboard Cite

Design Issues and Considerations for Low-Cost 3-D TSV IC Technology

Plas

Limaye

Loi

et al. 2011

IEEE J. Solid-State Circuits

238

View full text Add to dashboard Cite

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

Conti

Schilling

Schiavone

et al. 2017

IEEE Trans. Circuits Syst. I

104

View full text Add to dashboard Cite

Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load -but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65 nm technology, consumes less than 20 mW on average at 0.8 V achieving an efficiency of up to 70 pJ/B in encryption, 50 pJ/px in convolution, or up to 25 MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16 pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74 pJ/op; and seizure detection with encrypted data collection from EEG within 12.7 pJ/op.

show abstract

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

Azarkhish

Rossi

Loi

et al. 2018

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Abstract-High-performance computing systems are moving towards 2.5D and 3D memory hierarchies, based on High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) to mitigate the main memory bottlenecks. This trend is also creating new opportunities to revisit near-memory computation. In this paper, we propose a flexible processor-in-memory (PIM) solution for scalable and energy-efficient execution of deep convolutional networks (ConvNets), one of the fastest-growing workloads for servers and high-end embedded systems. Our codesign approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster. NeuroClusters have a modular design based on NeuroStream coprocessors (for Convolution-intensive computations) and general-purpose RISC-V cores. In addition, a DRAM-friendly tiling mechanism and a scalable computation paradigm are presented to efficiently harness this computational capability with a very low programming effort. NeuroCluster occupies only 8% of the total logic-base (LoB) die area in a standard HMC and achieves an average performance of 240 GFLOPS for complete execution of full-featured state-of-the-art (SoA) ConvNets within a power budget of 2.5 W. Overall 11 W is consumed in a single SMC device, with 22.5 GFLOPS/W energy-efficiency which is 3.5X better than the best GPU implementations in similar technologies. The minor increase in system-level power and the negligible area increase make our PIM system a cost-effective and energy efficient solution, easily scalable to 955 GFLOPS with a small network of just four SMCs.

show abstract

PULP: A parallel ultra low power platform for next generation IoT applications

Rossi¹,

Conti²,

Marongiu³

et al. 2015

101

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Igor Loi

Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices

GAP-8: A RISC-V SoC for AI at the Edge of the IoT

Mr.Wolf: An Energy-Precision Scalable Parallel Ultra Low Power SoC for IoT Edge Processing

A fully-synthesizable single-cycle interconnection network for Shared-L1 processor clusters

Design Issues and Considerations for Low-Cost 3-D TSV IC Technology

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

PULP: A parallel ultra low power platform for next generation IoT applications

Contact Info

Product

Resources

About