This paper describes a fast and efficient hardware-accelerated pseudoinverse computation through algorithm restructuring and leveraging FPGA synthesis directives for parallelism prior to high-level synthesis (HLS). The algorithm, which is composed of modified Gram-Schmidt QR decomposition (MGS-QRD), triangular matrix inversion (TMI), and matrix multiplication (MM), is synthesized and implemented on a field-programmable gate array (FPGA). MGS-QRD is restructured and augmented with parallelism directives prior to synthesizing the algorithm, which yielded an MGS-QRD hardware accelerator with high throughput. Modifications to the current TMI algorithm were also proposed, in which the removal of redundant computational tasks was done in order to speed up overall operation. Data dependencies in the MM algorithm were carefully considered such that appropriate parallelism directives were inserted, and matching the data flow of MM with MGS-QRD and TMI modules was also performed to accelerate the pseudoinverse computation. The results showed that the proposed pseudoinverse module is better than the naïve implementation which is composed of existing MGS-QRD, TMI and a standard MM in terms of maximum frequency (1.24Â speedup), hardware resources (48% of reduction of DSP usage), latency (23% reduction), and throughput (62% increase).
Currently, Extreme Learning Machine (ELM) is one of the research trends in the machine learning field due to its remarkable performances in terms of complexity and computational speed. However, the big data era and the limitations of general-purpose processor cause the increasing of interest in hardware implementation of ELM in order to reduce the computational time. Hence, this work presents the hardware-software co-design of ELM to improve the overall performances. In the co-design paradigm, one of the important components of ELM, namely Given Rotation-QRD (GR-QRD) is developed as a hardware core. Field Programmable Gate Array (FPGA) is chosen as the platform for ELM implementation due to its reconfigurable capability and high parallelism. Moreover, the learning accuracy and computational time would be used to evaluate the performances of the proposed ELM design. Our experiment has shown that GR-QRD accelerator helps to reduce the computational time of ELM training by 41.75% while maintaining the same training accuracy in comparison to pure software of ELM.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.