Purpose: Problems in speech recognition are often apparent in telecommunication situations. For ecologically valid assessments of such conditions, it is important to quantify the impact of real environments including acoustic conditions at a far-end communication device and all paths of transmission degradation. This study presents an automated matrix sentence test procedure based on automatic speech recognition (ASR) integrated in a Voice over Internet Protocol (VoIP) infrastructure and compares the individual effects of transmission degradations with results from laboratory measurements. Method: Speech recognition thresholds (SRTs) were measured in 16 normal-hearing subjects in four test conditions: (a) a laboratory condition guided by a human experimenter, (b) a laboratory condition with reduced bandwidth and (c) additionally reduced headset quality to simulate typical communication systems, and (d) an automated, ASR-controlled adaptive test procedure over a real VoIP infrastructure. Errors of the ASR system were analyzed to show possible effects on measurement outcome Results: Measured SRTs showed a highly significant correlation ( r = .93) between the fully automatic and “laboratory” conditions, with a constant bias of about 1 dB indicating a linear shift of the data without affecting the distribution around the mean. The individual impact of the different system degradations on SRTs could be quantified Conclusions: This study provides a proof of concept for automated ASR-based SRT measurements over VoIP systems for speech audiometric testing in real communication systems, as it produced results comparable to traditional laboratory settings for this group of 16 normal-hearing subjects. This makes VoIP services a promising candidate for speech audiometric testing in real communication systems.
The automation of medical documentation is a highly desirable process, especially as it could avert significant temporal and monetary expenses in healthcare. With the help of complex modelling and high computational capability, Automatic Speech Recognition (ASR) and deep learning have made several promising attempts to this end. However, a factor that significantly determines the efficiency of these systems is the volume of speech that is processed in each medical examination. In the course of this study, we found that over half of the speech, recorded during follow-up examinations of patients treated with Intra-Vitreal Injections, was not relevant for medical documentation. In this paper, we evaluate the application of Convolutional and Long Short-Term Memory (LSTM) neural networks for the development of a speech classification module aimed at identifying speech relevant for medical report generation. In this regard, various topology parameters are tested and the effect of the model performance on different speaker attributes is analyzed. The results indicate that Convolutional Neural Networks (CNNs) are more successful than LSTM networks, and achieve a validation accuracy of 92.41%. Furthermore, on evaluation of the robustness of the model to gender, accent and unknown speakers, the neural network generalized satisfactorily.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.