Endpoint devices for Internet-of-Things not only need to work under extremely tight power envelope of a few milliwatts, but also need to be flexible in their computing capabilities, from a few kOPS to GOPS. Near-threshold (NT) operation can achieve higher energy efficiency, and the performance scalability can be gained through parallelism. In this paper we describe the design of an opensource RISC-V processor core specifically designed for NT operation in tightly coupled multi-core clusters. We introduce instructionextensions and microarchitectural optimizations to increase the computational density and to minimize the pressure towards the shared memory hierarchy. For typical data-intensive sensor processing workloads the proposed core is on average 3.5× faster and 3.2× more energy-efficient, thanks to a smart L0 buffer to reduce cache access contentions and support for compressed instructions. SIMD extensions, such as dot-products, and a built-in L0 storage further reduce the shared memory accesses by 8× reducing contentions by 3.2×. With four NT-optimized cores, the cluster is operational from 0.6 V to 1.2 V achieving a peak efficiency of 67 MOPS/mW in a low-cost 65 nm bulk CMOS technology. In a low power 28 nm FDSOI process a peak efficiency of 193 MOPS/mW (40 MHz, 1 mW) can be achieved.Index Terms-Internet-of-Things, Ultra-low-power, Multi-core, RISC-V, ISA-extensions.
Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load -but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65 nm technology, consumes less than 20 mW on average at 0.8 V achieving an efficiency of up to 70 pJ/B in encryption, 50 pJ/px in convolution, or up to 25 MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16 pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74 pJ/op; and seizure detection with encrypted data collection from EEG within 12.7 pJ/op.
Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7× over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7× energy efficiency improvement of NTX over contemporary GPUs at 4.4× less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95% parallel and energy efficiency, while providing 2.1× energy savings or 3.1× performance improvement over a GPU-based system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.