Speech is a complex naturally acquired human motor ability. It is characterized in adults with the production of about 14 different sounds per second via the harmonized actions of roughly 100 muscles. Speaker recognition is the capability of a software or hardware to receive speech signal, identify the speaker present in the speech signal and recognize the speaker afterwards. Feature extraction is accomplished by changing the speech waveform to a form of parametric representation at a relatively minimized data rate for subsequent processing and analysis. Therefore, acceptable classification is derived from excellent and quality features. Mel Frequency Cepstral Coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral Coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT) and Perceptual Linear Prediction (PLP) are the speech feature extraction techniques that were discussed in these chapter. These methods have been tested in a wide variety of applications, giving them high level of reliability and acceptability. Researchers have made several modifications to the above discussed techniques to make them less susceptible to noise, more robust and consume less time. In conclusion, none of the methods is superior to the other, the area of application would determine which method to select.
Stuttered speech is a dysfluency rich speech, more prevalent in males than females. It has been associated with insufficient air pressure or poor articulation, even though the root causes are more complex. The primary features include prolonged speech and repetitive speech, while some of its secondary features include, anxiety, fear, and shame. This study used LPC analysis and synthesis algorithms to reconstruct the stuttered speech. The results were evaluated using cepstral distance, Itakura-Saito distance, mean square error, and likelihood ratio. These measures implied perfect speech reconstruction quality. ASR was used for further testing, and the results showed that all the reconstructed speech samples were perfectly recognized while only three samples of the original speech were perfectly recognized. Keywords: stuttered speech, speech reconstruction, LPC analysis, LPC synthesis, objective quality measure AbstrakShuttered speech adalah speech yang kaya dysfluency, lebih banyak terjadi pada laki-laki daripada perempuan. Ini terkait dengan tekanan udara yang tidak cukup atau artikulasi yang buruk, meskipun akar penyebabnya lebih kompleks. Fitur utama termasuk speech yang berkepanjangan dan berulangulang, sementara beberapa fitur sekunder meliputi, kecemasan, ketakutan, dan rasa malu. Penelitian ini menggunakan LPC analysis dan synthesis algoritma untuk merekonstruksi stuttered speech. Hasil dievaluasi menggunakan jarak cepstral, jarak Itakura-Saito, mean square error, dan rasio likelihood. Langkah-langkah ini terkandung kualitas speech reconstruction yang sempurna. ASR digunakan untuk pengujian lebih lanjut, dan hasilnya menunjukkan bahwa semua sampel speech yang terekonstruksi dikenali dengan sempurna sementara hanya tiga sampel dari speech asli dikenali dengan sempurna.
Abstract. The level crossing (LX) or railway crossing being an intersection between a public road and a railway line, can be controlled actively or passively. Sound recognition can be used to actively control a level crossing. A system is proposed in this study for the use of sound to control a LX. This proposed system uses Mel Frequency Cepstral Coefficient (MFCC) as feature extractor, and Recurrent Neural Network (RNN) as classifier. The proposed system has shown a great potential that could be harnessed to contribute to the reduction in the loss of lives and properties at the LX.
Stuttering or stammering is disruptions in the normal flow of speech by dysfluencies, which can be repetitions or prolongations of phoneme or syllable. Stuttering cannot be permanently cured, though it may go into remission or stutterers can learn to shape their speech into fluent speech with an appropriate speech pathology treatment. Linear Prediction Coefficient (LPC), Linear Prediction Cepstral Coefficient (LPCC) and Line Spectral Frequency (LSF) were used for the feature extraction, while Multilayer Perceptron (MLP) was used as the classifier. The samples used were obtained from UCLASS (University College London Archive of Stuttered Speech) release 1. The LPCC-MLP system had the highest overall sensitivity, precision and the lowest overall misclassification rate. LPCC-MLP system had challenges with F3, the sensitivity of the system to F3 was negligible, similarly, the precision was moderate and the misclassification rate was negligible, but above 10%.
Obstacle detection can be considered central and paramount in designing mobile robots. This technique enables mobile robots equipped with sensors to transverse and maneuver freely in an environment preventing damage as a result of a collision with obstacles in its path. Several systems with different approaches have been developed for the anti-collision of a robot with obstacles. The approach to Sensor selection, path planning, and navigation processes determines the operation of such a system and differs from one another. This paper presents a low-cost ultrasonic distance sensor for obstacle detection to enhance anti-collision in mobile robot navigation. The system is designed with the C/C++ programming of the Arduino software (IDE) and implemented on the ATMega 2560 Microcontroller of the Arduino board. An ultrasonic sensor detects an obstacle and sends the data collected to the controller which directs the motor driver to stop or move the robot while following a visible predefined path (blackline) embedded in the ground and detected with the help of an IR sensor placed beneath the robot. Experimental results with varied obstacle positions show a decent performance scoring 96.4% accuracy at a 50cm distance to the obstacle.
Stuttering is a motor-speech disorder, having common features with other motor control disorders such as dystonia, Parkinson’s disease and Tourette’s syndrome. Stuttering results from complex interactions between factors such as motor, language, emotional and genetic. This study used Line Spectral Frequency (LSF) for the feature extraction, while using three classifiers for the identification purpose, Multilayer Perceptron (MLP), Recurrent Neural Network (RNN) and Radial Basis Function (RBF). The UCLASS (University College London Archive of Stuttered Speech) release 1 was used as database in this research. These recordings were from people of ages 12y11m to 19y5m, who were referred to clinics in London for assessment of their stuttering. The performance metrics used for interpreting the results are sensitivity, accuracy, precision and misclassification rate. Only M1 and M2 had below 100% sensitivity for RBF. The sensitivity of M1 was found to be between 40 & 60%, therefore categorized as moderate, while that of M2 falls between 60 & 80%, classed as substantial. Overall, RBF outperforms the two other classifiers, MLP and RNN for all the performance metrics considered.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.