Spiking neural networks (SNN) have gained popularity in embedded applications such as robotics and computer vision. The main advantages of SNN are the temporal plasticity, ease of use in neural interface circuits and reduced computation complexity. SNN have been successfully used for image classification. They provide a model for the mammalian visual cortex, image segmentation and pattern recognition. Different spiking neuron mathematical models exist, but their computational complexity makes them ill-suited for hardware implementation. In this paper, a novel, simplified and computationally efficient model of spike response model (SRM) neuron with spike-time dependent plasticity (STDP) learning is presented. Frequency spike coding based on receptive fields is used for data representation; images are encoded by the network and processed in a similar manner as the primary layers in visual cortex. The network output can be used as a primary feature extractor for further refined recognition or as a simple object classifier. Results show that the model can successfully learn and classify black and white images with added noise or partially obscured samples with up to ×20 computing speed-up at an equivalent classification ratio when compared to classic SRM neuron membrane models. The proposed solution combines spike encoding, network topology, neuron membrane model and STDP learning.
New chips for machine learning applications appear, they are tuned for a specific topology, being efficient by using highly parallel designs at the cost of high power or large complex devices. However, the computational demands of deep neural networks require flexible and efficient hardware architectures able to fit different applications, neural network types, number of inputs, outputs, layers, and units in each layer, making the migration from software to hardware easy. This paper describes novel hardware implementing any feedforward neural network (FFNN): multilayer perceptron, autoencoder, and logistic regression. The architecture admits an arbitrary input and output number, units in layers, and a number of layers. The hardware combines matrix algebra concepts with serial-parallel computation. It is based on a systolic ring of neural processing elements (NPE), only requiring as many NPEs as neuron units in the largest layer, no matter the number of layers. The use of resources grows linearly with the number of NPEs. This versatile architecture serves as an accelerator in real-time applications and its size does not affect the system clock frequency. Unlike most approaches, a single activation function block (AFB) for the whole FFNN is required. Performance, resource usage, and accuracy for several network topologies and activation functions are evaluated. The architecture reaches 550 MHz clock speed in a Virtex7 FPGA. The proposed implementation uses 18-bit fixed point achieving similar classification performance to a floating point approach. A reduced weight bit size does not affect the accuracy, allowing more weights in the same memory. Different FFNN for Iris and MNIST datasets were evaluated and, for a real-time application of abnormal cardiac detection, a ×256 acceleration was achieved. The proposed architecture can perform up to 1980 Giga operations per second (GOPS), implementing the multilayer FFNN of up to 3600 neurons per layer in a single chip. The architecture can be extended to bigger capacity devices or multi-chip by the simple NPE ring extension.INDEX TERMS Feedforward neural networks -FFNN, systolic hardware architecture, FPGA implementation, neural network acceleration, deep neural networks.
Due the fact that the required therapy to treat Ventricular Fibrillation (V F) is aggressive (electric shock), the lack of a proper detection and recovering therapy could cause serious injuries to the patient or trigger a ventricular fibrillation, or even death. This work describes the development of an automatic diagnostic system for the detection of the occurrence of V F in real time by means of the time-frequency representation (T F R) image of the ECG. The main novelties are the use of the T F R image as input for a classification process, as well as the use of combined classifiers. The feature extraction stage is eliminated and, together with the use of specialized binary classifiers, this method improves the results of the classification. To verify the validity of the method, four different classifiers in different combinations are used: Regression Logistic with L2 Regularization (L 2 R L R), adaptive neural network (A N N C), Bagging (B A G G), and K-nearest neighbor (K N N). The Hierarchical Method (HM) and Voting Majority Method (VMM) combinations are used. ECG signals used for evaluation were obtained from the standard MIT-BIH and AHA databases. When the classifiers were combined, it was observed that the combination of B A G G , K N N , and A N N C using the Hierarchical Method (HM) gave the best results, with a sensitivity of 95.58 ± 0.41%, a 99.31 ± 0.08% specificity, a 98.6 ± 0.04% of overall accuracy, and a precision of 98.25 ± 0.29% for V F . Whereas a sensitivity of 94.02 ± 0.58%, a specificity of 99.31 ± 0.08%, an overall accuracy of 99.14 ± 0.43%, and a precision of 98.59 ± 0.09% was obtained for V T with a run time between 0.07 s and 0.12 s. Results show that the use of T F R image data to feed the combined classifiers yields a reduction in execution time with performance values above to those obtained by individual classifiers. This is of special utility for V F detection in real time.
Alfredo.Rosado@uv.es (A.R.-M.); manuel.bataller@uv.es (M.B.-M.); Juan.Barrios@uv.es (J.B.-A.); juan.guerrero@uv.es (J.F.G.-M.)Abstract: Currently, there are some emerging online learning applications handling data streams in real-time. The On-line Sequential Extreme Learning Machine (OS-ELM) has been successfully used in real-time condition prediction applications because of its good generalization performance at an extreme learning speed, but the number of trainings by a second (training frequency) achieved in these continuous learning applications has to be further reduced. This paper proposes a performance-optimized implementation of the OS-ELM training algorithm when it is applied to real-time applications. In this case, the natural way of feeding the training of the neural network is one-by-one, i.e., training the neural network for each new incoming training input vector. Applying this restriction, the computational needs are drastically reduced. An FPGA-based implementation of the tailored OS-ELM algorithm is used to analyze, in a parameterized way, the level of optimization achieved. We observed that the tailored algorithm drastically reduces the number of clock cycles consumed for the training execution up to approximately the 1%. This performance enables high-speed sequential training ratios, such as 14 KHz of sequential training frequency for a 40 hidden neurons SLFN, or 180 Hz of sequential training frequency for a 500 hidden neurons SLFN. In practice, the proposed implementation computes the training almost 100 times faster, or more, than other applications in the bibliography. Besides, clock cycles follows a quadratic complexity O(Ñ 2 ), withÑ the number of hidden neurons, and are poorly influenced by the number of input neurons. However, it shows a pronounced sensitivity to data type precision even facing small-size problems, which force to use double floating-point precision data types to avoid finite precision arithmetic effects. In addition, it has been found that distributed memory is the limiting resource and, thus, it can be stated that current FPGA devices can support OS-ELM-based on-chip learning of up to 500 hidden neurons. Concluding, the proposed hardware implementation of the OS-ELM offers great possibilities for on-chip learning in portable systems and real-time applications where frequent and fast training is required.Electronics 2018, 7, 308 2 of 23 learning of neural networks for the prediction of future opponent robot coordinates; ref.[3] designed an ASIC on-chip learning to learn and extract features existing in input datasets, intended to embedded vision applications; or [4], that implemented a real-time classifier for neurological signals.The Extreme Learning Machine (ELM) algorithm possesses many aspects that makes it suitable for any real-time or custom hardware implementation. It has a reduced and fixed training time along with an extremely fast learning speed that allows determinism in the computation time and, thus, a great advantage compared to previous well-known training...
High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations' performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N × N matrices, the architecture requires N ALU-RAM blocks and performs O(N 2 ), requiring N 2 + 7 and N + 7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 × 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.