Purpose -This paper describes the development process of a mobile Voice User Interface (VUI) for Korean users with dysarthria with currently available speech recognition technology by conducting systematic user needs analysis and applying usability testing feedback to prototype system designs. Design/methodology/approach -Four usability surveys are conducted for the development of the prototype system. According to the two surveys on user needs and user experiences with existing VUI systems at the stage of the prototype design, the target platforms, and target applications are determined. Furthermore, a set of basic words is selected by the prospective users, which enables the system to be not only custom designed for dysarthric speakers but also individualized for each user. Reflecting the requests relating to general usage of the VUI and the UI design preference of users through evaluation of the initial prototype, we develop the final prototype, which is an individualized voice keyboard for mobile devices based on an isolated word recognition engine with word prediction. Findings -The results of this paper show that target user participation in system development is effective for improving usability and satisfaction of the system, as the system is developed considering various ideas and feedback obtained in each development stage from different prospective users. Originality/value -We have developed an automatic speech recognition-based mobile VUI system not only custom designed for dysarthric speakers but also individualized for each user, focussing on the usability aspect through four usability surveys. This voice keyboard system has the potential to be an assistive and alternative input method for people with speech impairment, including mild to moderate dysarthria, and people with physical disabilities.
An unsupervised competitive neural network for efficient clustering of Gaussian probability density function (GPDF) data of continuous density hidden Markov models (CDHMMs) is proposed in this paper. The proposed unsupervised competitive neural network, called the divergence-based centroid neural network (DCNN), employs the divergence measure as its distance measure and utilizes the statistical characteristics of observation densities in the HMM for speech recognition problems. While the conventional clustering algorithms used for the vector quantization (VQ) codebook design utilize only the mean values of the observation densities in the HMM, the proposed DCNN utilizes both the mean and the covariance values. When compared with other conventional unsupervised neural networks, the DCNN successfully allocates more code vectors to the regions where GPDF data are densely distributed while it allocates fewer code vectors to the regions where GPDF data are sparsely distributed. When applied to Korean monophone recognition problems as a tool to reduce the size of the codebook, the DCNN reduced the number of GPDFs used for code vectors by 65.3% while preserving recognition accuracy. Experimental results with a divergence-based k-means algorithm and a divergence-based self-organizing map algorithm are also presented in this paper for a performance comparison.
In emergency dispatching at 119 Command & Dispatch Center, some inconsistencies between the 'standard emergency aid system' and 'dispatch protocol,' which are both mandatory to follow, cause inefficiency in the dispatcher's performance. If an emergency dispatch system uses automatic speech recognition (ASR) to process the dispatcher's protocol speech during the case registration, it instantly extracts and provides the required information specified in the 'standard emergency aid system,' making the rescue command more efficient. For this purpose, we have developed a Korean large vocabulary continuous speech recognition system for 400,000 words to be used for the emergency dispatch system. The 400,000 words include vocabulary from news, SNS, blogs and emergency rescue domains. Acoustic model is constructed by using 1,300 hours of telephone call (8 kHz) speech, whereas language model is constructed by using 13 GB text corpus. From the transcribed corpus of 6,600 real telephone calls, call logs with emergency rescue command class and identified major symptom are extracted in connection with the rescue activity log and National Emergency Department Information System (NEDIS). ASR is applied to emergency dispatcher's repetition utterances about the patient information. Based on the Levenshtein distance between the ASR result and the template information, the emergency patient information is extracted. Experimental results show that 9.15% Word Error Rate of the speech recognition performance and 95.8% of emergency response detection performance are obtained for the emergency dispatch system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.