In speech pathology, new assistive technologies using ASR and machine learning approaches are being developed for detecting speech disorder events. Classically-trained ASR model tends to remove disfluencies from spoken utterances, due to its focus on producing clean and readable text output. However, diagnostic systems need to be able to track speech disfluencies, such as stuttering events, in order to determine the severity level of stuttering. To achieve this, ASR systems must be adapted to recognise full verbatim utterances, including pseudo-words and non-meaningful part-words. This work proposes a training regime to address this problem, and preserve a full verbatim output of stuttering speech. We use a lightly-supervised approach using task-oriented lattices to recognise the stuttering speech of children performing a standard reading task. This approach improved the WER by 27.8% relative to a baseline that uses word-lattices generated from the original prompt. The improved results preserved 63% of stuttering events (including sound, word, part-word and phrase repetition, and revision). This work also proposes a separate correction layer on top of the ASR that detects prolongation events (which are poorly recognised by the ASR). This increases the percentage of preserved stuttering events to 70%.
A huge amount of research has been done in the field of speech signal processing in recent years. In particular, there has been increasing interest in the automatic speech recognition (ASR) technology field. ASR began with simple systems that responded to a limited number of sounds and has evolved into sophisticated systems that respond fluently to natural language. This systematic review of automatic speech recognition is provided to help other researchers with the most significant topics published in the last six years. This research will also help in identifying recent major ASR challenges in real-world environments. In addition, it discusses current research gaps in ASR. This review covers articles available in five research databases that were completed according to the preferred reporting items for systematic reviews and metaanalyses (PRISMA) protocol. The search strategy yielded 45 articles related to the study's scope for the period 2015-2020. The results presented in this review shed light on research trends in the area of ASR and also suggest new research directions.
Stuttering is a common speech disfluency that may persist into adulthood if not treated in its early stages. Techniques from spoken language understanding may be applied to provide automated diagnoses of stuttering from voice recordings; however, there are several difficulties, including the lack of training data involving young children and the high dimensionality of these data. This study investigates how automatic speech recognition (ASR) could help clinicians by providing a tool that automatically recognises stuttering events and provides a useful written transcription of what was said. In addition, to enhance the performance of ASR and to alleviate the lack of stuttering data, this study examines the effect of augmenting the language model with artificially generated data. The performance of the ASR tool with and without language model augmentation is compared. Following language model augmentation, the ASR tool's performance improved recall from 38% to 62.2% and precision from 56.58% to 71%. When mis-recognised events are more coarsely classified as stuttering/ non-stuttering events, the performance improves up to 73% in recall and 84% in precision. Although the obtained results are not perfect, they map to fairly robust stutter/ non-stutter decision boundaries.
Abstract. Stuttering is a common problem in childhood that may persist into adulthood if not treated in early stages. Techniques from spoken language understanding may be applied to provide automated diagnosis of stuttering from children speech. The main challenges however lie in the lack of training data and the high dimensionality of this data. This study investigates the applicability of machine learning approaches for detecting stuttering events in transcripts. Two machine learning approaches were applied, namely HELM and CRF. The performance of these two approaches are compared, and the effect of data augmentation is examined in both approaches. Experimental results show that CRF outperforms HELM by 2.2% in the baseline experiments. Data augmentation helps improve systems performance, especially for rarely available events. In addition to the annotated augmented data, this study also adds annotated human transcriptions from real stuttered children's speech to help expand the research in this field.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Technology evaluation in the electronics field leads to the great development of Wireless Sensor Networks (WSN) for a variety of applications. The sensor nodes are deployed in hazardous environments, and they are operated by isolated battery sources. Network connectivity is purely based on power availability, which impacts the network lifetime. Hence, power must be used wisely to prolong the network lifetime. The sensor nodes that fail due to power have to detect quickly to maintain the network. In a WSN, classifiers are used to detect the faults for checking the data generated by the sensor nodes. In this paper, six classifiers such as Support Vector Machine, Convolutional Neural Network, Multilayer Perceptron, Stochastic Gradient Descent, Random Forest and Probabilistic Neural Network have been taken for analysis. Six different faults (Offset fault, Gain fault, Stuck-at fault, Out of Bounds, Spike fault and Data loss) are injected in the data generated by the sensor nodes. The faulty data are checked by the classifiers. The simulation results show that the Random Forest detected more faults and it also outperformed all other classifiers in that category.
Most consumers depend on online reviews posted on e-commerce websites when determining whether or not to buy a service or a product. Moreover, due to the presence of fraudulent (deceptive) reviews, the fundamental problem in such reviews is not fully addressed. Thus, deceptive reviews present wrong and misguiding opinions that are harmful to consumers and e-commerce. People called fraudsters who intentionally write deceptive reviews to target and deceive potential consumers, as they target businesses that have a well-built reputation or fame for their personal promotion, create such reviews. Therefore, developing a deceptive review detection system is essential for identifying and classifying online product reviews as truthful or fake/deceptive reviews. The main objective of this research work is to analyze and identify online deceptive reviews in electronic product reviews in the Amazon and Yelp domains. For this purpose, two experiments were conducted individually. The first was executed on standard Yelp product reviews. The second was performed on Amazon product review datasets. For this dataset, we created and labeled it using a deceptiveness score calculated based on features extracted from the review text using the linguistic inquiry and word count (LIWC) tool. These features were authenticity, negative words, comparing words negation words, analytical thinking, and positive words as well as the given rating value by a user. The recurrent neural network, bidirectional long short-term memory (RNN-BLSTM) model, was used to both datasets in order to conduct the evaluation. The application of this model was contingent upon the learning of words embedding of the review text. Finally, we evaluated the RNN-BLSTM model’s performance using the Yelp and Amazon datasets and compared the results. The results were 89.6% regarding testing accuracy for both datasets. From our experimental results, we observed that the LIWC feature with word embedding in the review text provided better accuracy performance compared with other existing methods.
The present work presents a statistical method to translate human voices across age groups, based on commonalities in voices of blood relations. The age-translated voices have been naturalized extracting the blood relation features e.g., pitch, duration, energy, using Mel Frequency Cepstrum Coefficients (MFCC), for social compatibility of the voice-impaired. The system has been demonstrated using standard English and an Indian language. The voice samples for resynthesis were derived from 12 families, with member ages ranging from 8-80 years. The voice-age translation, performed using the Pitch synchronous overlap and add (PSOLA) approach, by modulation of extracted voice features, was validated by perception test. The translated and resynthesized voices were correlated using Linde, Buzo, Gray (LBG), and Kekre's Fast Codebook generation (KFCG) algorithms. For translated voice targets, a strong (θ >∼93% and θ >∼96%) correlation was found with blood relatives, whereas, a weak (θ <∼78% and θ <∼80%) correlation range was found between different families and different gender from same families. The study further subcategorized the sampling and synthesis of the voices into similar or dissimilar gender groups, using a support vector machine (SVM) choosing between available voice samples. Finally, ∼96%, ∼93%, and ∼94% accuracies were obtained in the identification of the gender of the voice sample, the age group samples, and the correlation between the original and converted voice samples, respectively. The results obtained were close to the natural voice sample features and are envisaged to facilitate a near-natural voice for speech-impaired easily.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.