Frank K. Gürkaynak scite author profile

Endpoint devices for Internet-of-Things not only need to work under extremely tight power envelope of a few milliwatts, but also need to be flexible in their computing capabilities, from a few kOPS to GOPS. Near-threshold (NT) operation can achieve higher energy efficiency, and the performance scalability can be gained through parallelism. In this paper we describe the design of an opensource RISC-V processor core specifically designed for NT operation in tightly coupled multi-core clusters. We introduce instructionextensions and microarchitectural optimizations to increase the computational density and to minimize the pressure towards the shared memory hierarchy. For typical data-intensive sensor processing workloads the proposed core is on average 3.5× faster and 3.2× more energy-efficient, thanks to a smart L0 buffer to reduce cache access contentions and support for compressed instructions. SIMD extensions, such as dot-products, and a built-in L0 storage further reduce the shared memory accesses by 8× reducing contentions by 3.2×. With four NT-optimized cores, the cluster is operational from 0.6 V to 1.2 V achieving a peak efficiency of 67 MOPS/mW in a low-cost 65 nm bulk CMOS technology. In a low power 28 nm FDSOI process a peak efficiency of 193 MOPS/mW (40 MHz, 1 mW) can be achieved.Index Terms-Internet-of-Things, Ultra-low-power, Multi-core, RISC-V, ISA-extensions.

show abstract

Power-analysis attack on an ASIC AES implementation

Örs

Gürkaynak

Oswald³

et al. 2004

153

View full text Add to dashboard Cite

show abstract

Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook

Krstić

Grass

Gürkaynak

et al. 2007

IEEE Des. Test. Comput.

160

View full text Add to dashboard Cite

Dynamic memory-based physically unclonable function for the generation of unique identifiers and true random numbers

Keller

Gürkaynak

Kaeslin

et al. 2014

View full text Add to dashboard Cite

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

Conti

Schilling

Schiavone

et al. 2017

IEEE Trans. Circuits Syst. I

104

View full text Add to dashboard Cite

Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load -but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline. Using encryption to protect sensitive data at the boundary of the on-chip analytics engine is a way to address data security issues. To cope with the combined workload of analytics and encryption in a tight power envelope, we propose Fulmine, a System-on-Chip based on a tightly-coupled multi-core cluster augmented with specialized blocks for compute-intensive data processing and encryption functions, supporting software programmability for regular computing tasks. The Fulmine SoC, fabricated in 65 nm technology, consumes less than 20 mW on average at 0.8 V achieving an efficiency of up to 70 pJ/B in encryption, 50 pJ/px in convolution, or up to 25 MIPS/mW in software. As a strong argument for real-life flexible application of our platform, we show experimental results for three secure analytics use cases: secure autonomous aerial surveillance with a state-of-the-art deep CNN consuming 3.16 pJ per equivalent RISC op; local CNN-based face detection with secured remote recognition in 5.74 pJ/op; and seizure detection with encrypted data collection from EEG within 12.7 pJ/op.

show abstract

A 60 GOPS/W, −1.8 V to 0.9 V body bias ULP cluster in 28 nm UTBB FD-SOI technology

Rossi

Pullini

Loi

et al. 2016

Solid-State Electronics

View full text Add to dashboard Cite

Energy-Efficient Near-Threshold Parallel Computing: The PULPv2 Cluster

et al. 2017

View full text Add to dashboard Cite

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Schuiki

Schaffner

Gürkaynak

et al. 2019

IEEE Trans. Comput.

View full text Add to dashboard Cite

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7× over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7× energy efficiency improvement of NTX over contemporary GPUs at 4.4× less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95% parallel and energy efficiency, while providing 2.1× energy savings or 3.1× performance improvement over a GPU-based system.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.