In this paper, we present Ara, a 64-bit vector processor based on the version 0.5 draft of RISC-V's vector extension, implemented in GLOBALFOUNDRIES 22FDX FD-SOI technology. Ara's microarchitecture is scalable, as it is composed of a set of identical lanes, each containing part of the processor's vector register file and functional units. It achieves up to 97% FPU utilization when running a 256 × 256 double precision matrix multiplication on sixteen lanes. Ara runs at 1.2 GHz in the typical corner (TT/0.80 V/25 • C), achieving a performance up to 34 DP−GFLOPS. In terms of energy efficiency, Ara achieves up to 67 DP−GFLOPS/W under the same conditions, which is 56% higher than similar vector processors found in literature. An analysis on several vectorizable linear algebra computation kernels for a range of different matrix and vector sizes gives insight into performance limitations and bottlenecks for vector processors and outlines directions to maintain high energy efficiency even for small matrix sizes where the vector architecture achieves suboptimal utilization of the available FPUs.
Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7× over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7× energy efficiency improvement of NTX over contemporary GPUs at 4.4× less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95% parallel and energy efficiency, while providing 2.1× energy savings or 3.1× performance improvement over a GPU-based system.
Guaranteed numerical precision of each elementary step in a complex computation has been the mainstay of traditional computing systems for many years. This era, fueled by Moore's law and the constant exponential improvement in computing efficiency, is at its twilight: from tiny nodes of the Internet-of-Things, to large HPC computing centers, sub-picoJoule/operation energy efficiency is essential for practical realizations. To overcome the power wall, a shift from traditional computing paradigms is now mandatory. In this paper we present the driving motivations, roadmap, and expected impact of the European project OPRECOMP. OPRECOMP aims to (i) develop the first complete transprecision computing framework, (ii) apply it to a wide range of hardware platforms, from the sub-milliWatt up to the MegaWatt range, and (iii) demonstrate impact in a wide range of computational domains, spanning IoT, Big Data Analytics, Deep Learning, and HPC simulations. By combining together into a seamless design transprecision advances in devices, circuits, software tools, and algorithms, we expect to achieve major energy efficiency improvements, even when there is no freedom to relax end-to-end application quality of results. Indeed, OPRECOMP aims at demolishing the ultraconservative "precise" computing abstraction, replacing it with a more flexible and efficient one, namely transprecision computing.
The mitigation of rapid mass movements involves a subtle interplay between field surveys, numerical modelling, and experience. Hazard engineers rely on a combination of best practices and, if available, historical facts as a vital prerequisite in establishing reproducible and accurate hazard zoning. Full-scale field tests have been performed to reinforce the physical understanding of debris flows and snow avalanches. Rockfall dynamics are - especially the quantification of energy dissipation during the complex rock-ground interaction - largely unknown. The awareness of rock shape dependence is growing, but presently, there exists little experimental basis on how rockfall hazard scales with rock mass, size, and shape. Here, we present a unique data set of induced single-block rockfall events comprising data from equant and wheel-shaped blocks with masses up to 2670 kg, quantifying the influence of rock shape and mass on lateral spreading and longitudinal runout and hence challenging common practices in rockfall hazard assessment.
Abstract-Rockfalls have over the last decades become a serious and frequent hazard, especially due to larger variations in precipitation and temperatures, destabilizing rocky slopes in mountainous regions. Hence, civil engineers are applying the latest simulation tools to perform risk assessments and plan mitigation strategies. These tools are based on various models with many parameters that should be calibrated and evaluated with real-world in-field measurement data.In this work, we present a rugged low-power multi-sensor node termed StoneNode, that has been designed to acquire and log accurate inertial sensor measurements during induced infield experiments with falling rocks. The node hosts low-power MEMS sensors with high dynamic ranges sampled up to 1 kHz, and provides a long battery life-time of up to 56 h, enabling long-lasting field studies with a duration of several working days. Exhaustive in-field experiments have been carried out with several differently shaped rocks on typical terrain in the Swiss alpine region. The experiments comprise more than 100 induced tests with several heavy impacts of >400 g. This paper gives a detailed summary of these results, including unprecedented insitu data of rock fall trajectories and post-experimental validation where we compare simulated rockfall deposition distributions and motion traces with in-field measurements after calibration of the simulation module.Our results and experience gained in-field confirm that the StoneNode is a reliable, easy-to-use device, which greatly facilitates the data acquisition process. Further, the results obtained with the calibrated simulation tool shows good quantitative and qualitative congruence with the experiments, further reaffirming our methodological approach.
Abstract-Spontaneous occurring rockfalls are a serious danger, especially nowadays as global warming leads to a retrogression of the permafrost, which stabilized terrain in mountainous regions. In order to perform risk assessments and develop mitigation strategies, advanced simulation tools and models have been developed over the last years. These models come with many parameters and need to be calibrated and validated with realworld data to produce reliable estimates.To this end, we developed StoneNode, a rugged, small, lowpower sensor device which can be embedded into boulders to measure accelerations and angular velocities. The node employs low-power MEMS sensors with high dynamic range and has a maximum operating time of more than 56 h. First field experiments confirm that the StoneNode is a reliable, easy-to-use device, which greatly facilitates the data acquisition process.
Spatio-temporal edge-aware (STEA) filtering methods have recently received increased attention due to their ability to efficiently solve or approximate important image-domain problems in a temporally consistent manner - which is a crucial property for video-processing applications. However, existing STEA methods are currently unsuited for real-time, embedded stream-processing settings due to their high processing latency, large memory, and bandwidth requirements, and the need for accurate optical flow to enable filtering along motion paths. To this end, we propose an efficient STEA filtering pipeline based on the recently proposed permeability filter (PF), which offers high quality and halo reduction capabilities. Using mathematical properties of the PF, we reformulate its temporal extension as a causal, non-linear infinite impulse response filter, which can be efficiently evaluated due to its incremental nature. We bootstrap our own accurate flow using the PF and its temporal extension by interpolating a quasi-dense nearest neighbour field obtained with an improved PatchMatch algorithm, which employs binarized octal orientation maps (BOOM) descriptors to find correspondences among subsequent frames. Our method is able to create temporally consistent results for a variety of applications such as optical flow estimation, sparse data upsampling, visual saliency computation and disparity estimation. We benchmark our optical flow estimation on the MPI Sintel dataset, where we currently achieve a Pareto optimal quality-efficiency tradeoff with an average endpoint error of 7.68 at 0.59 s single-core execution time on a recent desktop machine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.