One of the most difficult speech recognition tasks is accurate recognition of human-to-human communication. Advances in deep learning over the last few years have produced major speech improvements in recognition on the representative Switch-board conversational corpus. Word error rates that just a few years ago were 14% have dropped to 8.0%, then 6.6% and most recently 5.8%, and are now believed to be within striking range of human performance. This raises two issues - what is human performance, and how far down can we still drive speech recognition error rates? The main objective of this article is the development of a comparative study of the performance of Automatic Speech Recognition (ASR) algorithms using a database made up of a set of signals created by female and male speakers of different ages. We will also develop techniques for the Software and Hardware implementation of these algorithms and test them in an embedded electronic card based on a reconfigurable circuit (Field Programmable Gate Array FPGA). We will present an analysis of the results of classifications for the best Support Vector Machine architectures (SVM) and Artificial Neural Networks of Multi-Layer Perceptron (MLP). Following our analysis, we created NIOSII processors and we tested their operations as well as their characteristics. The characteristics of each processor are specified in this article (cost, size, speed, power consumption and complexity). At the end of this work, we physically implemented the architecture of the Mel Frequency Cepstral Coefficients (MFCC) extraction algorithm as well as the classification algorithm that provided the best results.
Speech processing in real time requires the use of fast, reconfigurable electronic circuits capable of handling large amounts of information generated by the audio source. This article presents hardware implementations of a multilayer perceptron (MLP) and the MFCC algorithm for speech recognition. These algorithms have been implemented in hardware and tested in an on-board electronic card based on a reconfigurable circuit (FPGA). We also present a comparative study between several architectures of MLP and with the literature on the level of costs with regard to the surface of silicon, the speed and the computing resources required. Following the FPGA circuit modification, we created NIOSII processors to physically implement the architecture of ANN-type MLPs and MFCC speech recognition algorithms and perform real-time speech recognition functions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with đź’™ for researchers
Part of the Research Solutions Family.