Due to continuous improvements in the resources available on FPGAs, it is becoming increasingly possible to accelerate floating point algorithms. The solution of a system of linear equations forms the basis of many problems in engineering and science, but its calculation is highly time consuming. The minimum residual algorithm (MINRES) is one method to solve this problem, and is highly effective provided the matrix exhibits certain characteristics. This paper examines an IEEE 754 single precision floating point implementation of the MINRES algorithm on an FPGA. It demonstrates that through parallelisation and heavy pipelining of all floating point components it is possible to achieve a sustained performance of up to 53 GFLOPS on the Virtex5-330T. This compares favourably to other hardware implementations of floating point matrix inversion algorithms, and corresponds to an improvement of nearly an order of magnitude compared to a software implementation.
Abstract-The precision used in an algorithm affects the error and performance of individual computations, the memory usage and the potential parallelism for a fixed hardware budget. This paper describes a new method to determine the minimum precision required to meet a given error specification for an algorithm that consists of the basic algebraic operations. Using this approach, it is possible to significantly reduce the computational word-length in comparison to existing methods, and this can lead to superior hardware designs. We demonstrate the proposed procedure on an iteration of the conjugate gradient algorithm, achieving proofs of bounds that can translate to global wordlength savings ranging from a few bits to proving the existence of ranges that must otherwise be assumed to be unbounded when using competing approaches. We also achieve comparable bounds to recent literature in a small fraction of the execution time, with greater scalability.
Background There has been a recent increased interest in monitoring health using wearable sensor technologies; however, few have focused on breathing. The ability to monitor breathing metrics may have indications both for general health as well as respiratory conditions such as asthma, where long-term monitoring of lung function has shown promising utility. Objective In this paper, we explore a long short-term memory (LSTM) architecture and predict measures of interbreath intervals, respiratory rate, and the inspiration-expiration ratio from a photoplethysmogram signal. This serves as a proof-of-concept study of the applicability of a machine learning architecture to the derivation of respiratory metrics. Methods A pulse oximeter was mounted to the left index finger of 9 healthy subjects who breathed at controlled respiratory rates. A respiratory band was used to collect a reference signal as a comparison. Results Over a 40-second window, the LSTM model predicted a respiratory waveform through which breathing metrics could be derived with a bias value and 95% CI. Metrics included inspiration time (–0.16 seconds, –1.64 to 1.31 seconds), expiration time (0.09 seconds, –1.35 to 1.53 seconds), respiratory rate (0.12 breaths per minute, –2.13 to 2.37 breaths per minute), interbreath intervals (–0.07 seconds, –1.75 to 1.61 seconds), and the inspiration-expiration ratio (0.09, –0.66 to 0.84). Conclusions A trained LSTM model shows acceptable accuracy for deriving breathing metrics and could be useful for long-term breathing monitoring in health. Its utility in respiratory disease (eg, asthma) warrants further investigation.
Low-precision arithmetic operations to accelerate deep-learning applications on field-programmable gate arrays (FPGAs) have been studied extensively, because they offer the potential to save silicon area or increase throughput. However, these benefits come at the cost of a decrease in accuracy. In this article, we demonstrate that reconfigurable constant coefficient multipliers (RCCMs) offer a better alternative for saving the silicon area than utilizing low-precision arithmetic. RCCMs multiply input values by a restricted choice of coefficients using only adders, subtractors, bit shifts, and multiplexers (MUXes), meaning that they can be heavily optimized for FPGAs. We propose a family of RCCMs tailored to FPGA logic elements to ensure their efficient utilization. To minimize information loss from quantization, we then develop novel training techniques that map the possible coefficient representations of the RCCMs to neural network weight parameter distributions. This enables the usage of the RCCMs in hardware, while maintaining high accuracy. We demonstrate the benefits of these techniques using AlexNet, ResNet-18, and ResNet-50 networks. The resulting implementations achieve up to 50% resource savings over traditional 8-bit quantized networks, translating to significant speedups and power savings. Our RCCM with the lowest resource requirements exceeds 6-bit fixed point accuracy, while all other implementations with RCCMs achieve at least similar accuracy to an 8-bit uniformly quantized design, while achieving significant resource savings.Index Terms-Digital arithmetic, field programmable gate arrays (FPGAs), neural networks, neural network hardware, quantization. 1063-8210
When migrating an algorithm onto hardware, the potential saving that can be obtained by tuning the precision used in the algorithm to meet a range or error specification is often overlooked; the major reason is that it is hard to choose a number system which can guarantee any such specification can be met. Instead, the problem is mitigated by opting to use IEEE standard single or double precision so as to be 'no worse' than a software implementation. However, the flexibility in the number representation is one of the key factors that can only be exploited on FPGAs, unlike GPUs and general purpose processors, and hence ignoring this potential significantly limits the performance achievable on an FPGA. To this end, this paper describes a tool which analyses algorithms with given input ranges under a finite precision to provide information that could be used to tune the hardware to the algorithm specifications. We demonstrate the proposed procedure on an iteration of the conjugate gradient algorithm, achieving a reduction in slices of over 40% when meeting the same error specification found by traditional methods. We also show it achieves comparable bounds to recent literature in a small fraction of the execution time, with greater scalability.
ABSTRACT:We assessed the impact of large-scale commercial and recreational harvesting of polychaete worms Marphysa spp. on macrobenthic assemblages in a subtropical estuary in Queensland, Australia, by examining: (1) the spatial extent of harvesting activities and the rate of recovery of the seagrass habitat over an 18 to 20 mo period; (2) the recovery of infauna in and around commercial pits of known age; (3) the indirect effects of physical disturbance from trampling and deposition of sediments during harvesting on epibenthos in areas adjacent to commercial and recreational pits; (4) impacts of potential indirect effects through manipulative experimentation. Harvesting caused a loss of seagrass, changes to the topography and compaction of the sediments associated with the creation of walls around commercial pits, and the deposition of rubble dug from within the pit. The walls and rubble were still evident after 18 to 20 mo, but comprised only a small proportion of the total area on the intertidal banks. There was a shift from an intertidal area dominated by Zostera capricorni to one with a mixture of Z. capricorni, Halophila spp. and Halodule uninervis, but there was no overall decline in the biomass of seagrass in these areas. There were distinct impacts from harvesting on the abundance of benthic infauna, especially amphipods, polychaetes and gastropods, and these effects were still detectable after 4 mo of potential recovery. After 12 mo, there were no detectable differences in the abundances of these infauna between dug areas and reference areas, which suggested that infauna had recovered from impacts of harvesting; however, an extensive bloom of toxic fireweed Lyngbya majuscula may have masked any remaining impacts. There were no detectable impacts of harvesting on epifauna living in the seagrass immediately around commercial or recreational pits.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.