An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks

Zini, Julia El; Rizk, Yara; Awad, Mariette

doi:10.2478/jaiscr-2021-0003

Cited by 19 publications

(5 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results showed that COV-ELM outperforms new-generation machine learning algorithms. In El Zini et al [143], an ELM-based recurrent neural network training algorithm was presented that takes advantage of GPU-shared memory and parallel QR factorization algorithms to reach optimal solutions efficiently. The proposed algorithm reaches up to 461 times the speedup of its sequential counterpart.…”

Section: Graphics Processing Unitmentioning

confidence: 99%

A Review on Large-Scale Data Processing with Parallel and Distributed Randomized Extreme Learning Machine Neural Networks

Gelvez-Almeida,

Mora,

Barrientos

et al. 2024

MCA

View full text Add to dashboard Cite

The randomization-based feedforward neural network has raised great interest in the scientific community due to its simplicity, training speed, and accuracy comparable to traditional learning algorithms. The basic algorithm consists of randomly determining the weights and biases of the hidden layer and analytically calculating the weights of the output layer by solving a linear overdetermined system using the Moore–Penrose generalized inverse. When processing large volumes of data, randomization-based feedforward neural network models consume large amounts of memory and drastically increase training time. To efficiently solve the above problems, parallel and distributed models have recently been proposed. Previous reviews of randomization-based feedforward neural network models have mainly focused on categorizing and describing the evolution of the algorithms presented in the literature. The main contribution of this paper is to approach the topic from the perspective of the handling of large volumes of data. In this sense, we present a current and extensive review of the parallel and distributed models of randomized feedforward neural networks, focusing on extreme learning machine. In particular, we review the mathematical foundations (Moore–Penrose generalized inverse and solution of linear systems using parallel and distributed methods) and hardware and software technologies considered in current implementations.

show abstract

Section: Graphics Processing Unitmentioning

confidence: 99%

A Review on Large-Scale Data Processing with Parallel and Distributed Randomized Extreme Learning Machine Neural Networks

Gelvez-Almeida,

Mora,

Barrientos

et al. 2024

MCA

View full text Add to dashboard Cite

show abstract

“…Each thread can either buffer or reveal its state. While speculative parallelism can be used for improving the performance of automatic parallelization algorithms, 59 it requires additional hardware or software support resulting additional overheads. Attempts are made to demonstrate a computationally efficient architectural framework to reduce the noncompulsory overhead associated with misspeculation.…”

Section: Related Workmentioning

confidence: 99%

An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheads

Kumar

Singh

Aggarwal³

et al. 2022

Int J of Intelligent Sys

View full text Add to dashboard Cite

In the last few decades, technology advancements have paved the way for the creation of intelligent and autonomous systems that utilize complex calculations which are both time‐consuming and central processing unit intensive. As a consequence, parallel processing systems are gaining popularity to enhance overall computer performance. Programmers should be able to efficiently utilize available hardware resources with parallelization in an ideal world. Through the automatic parallelization of sequential code, multithreading can be executed without extra supervision. However, a wide range of software dependencies prevents this from being feasible. An architectural framework for speculative parallelization along with an efficient memory analysis and computational algorithms for the code generation are proposed that can provide optimal performance. Furthermore, a suitable support of hardware design as a runtime library to the proposed architectural framework is presented which can be used to recover misspeculated results during execution to minimize speculative parallelism overhead. The implementation makes use of the Low‐Level Virtual Machine compiler infrastructure and is tested on numerous benchmarks, thus making it highly scalable in terms of programming languages and architectures. According to our experimental results, there is significant potential for speedup increase. In comparison to the overall function speedup, that is, geomean speedup of 5.2× approximately when using the proposed architecture without hardware support, the proposed architectural framework and algorithm with hardware support give an average geomean speedup of 7.0× approximately on the given benchmark which is written in C/C++.

show abstract

“…However, this work is for the batch ELM algorithm and again it does not involve any regularization to prevent over‐fitting and memory resources are again limited by the number of training samples, quickly filled by the large‐scale data. El Zini et al 19 applied ELM to recurrent neural network (RNN) training which normally performed with back propagation (BP). GPU acceleration was also integrated to speed‐up the iterative and time consuming BP training.…”

Section: Related Workmentioning

confidence: 99%

“…However, this work is for the batch ELM algorithm and again it does not involve any regularization to prevent over-fitting and memory resources are again limited by the number of training samples, quickly filled by the large-scale data. El Zini et al 19…”

Section: Related Workmentioning

confidence: 99%

GPU‐accelerated and mixed norm regularized online extreme learning machine

Polat

Kayhan

2022

Concurrency and Computation

View full text Add to dashboard Cite

Extreme learning machine (ELM) is a prominent example of neural network with its fast training speed, and good prediction performance. An online version of ELM called online sequential extreme learning machine (OS-ELM) has also been proposed for the sequential training. Combined with the need for regularization to prevent over-fitting in addition to the large number of neurons required in the hidden layer, OS-ELM demands huge amount of computation power for the large-scale data. In this article, a mixed norm (l 2,1 ) regularized online machine learning algorithm (MRO-ELM) that is based on alternating direction method of multipliers (ADMM) is proposed. A linear combination of the mixed norm and the Frobenius norm regularization is applied using the ADMM framework and update formulas are derived. Graphics processing unit (GPU) accelerated version of MRO-ELM (GPU-MRO-ELM) is also proposed to reduce the training time by processing appropriate parts in parallel using the implemented custom kernels. In addition, a novel automatic hyper-parameter tuning method is incorporated to GPU-MRO-ELM using progressive validation with GPU acceleration.The experimental results show that the MRO-ELM algorithm and its GPU version outperform OS-ELM in terms of training speed, and testing accuracy. Also, compared to the cross validation, the proposed automatic hyper-parameter tuning demonstrates dramatical reduction in the tuning time.

show abstract

An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks

Cited by 19 publications

References 36 publications

A Review on Large-Scale Data Processing with Parallel and Distributed Randomized Extreme Learning Machine Neural Networks

A Review on Large-Scale Data Processing with Parallel and Distributed Randomized Extreme Learning Machine Neural Networks

An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheads

GPU‐accelerated and mixed norm regularized online extreme learning machine

Contact Info

Product

Resources

About