In segment-based speech recognition systems, the could be implemented using various methods including using quality of the segmentation step is a major factor highly affecting dynamic programming techniques to search the composed their accuracies. This paper proposes methods to reduce missing weighted finite state transducer between the segment graph segments caused by boundary insertion errors in segment graphs, and a pronunciation graph derived from the grammar of the which, in the case of Thai, could be generated from a -;. probabilistic segmentation with limited speech resources. recogiion tas teest.Acoustic discontinuities and manners of articulation are used to It is obvious that the quality of the segment graph, which verify boundaries of the segment graph. Segments are added to could be judged based upon how many correctly hypothesized the graph in the case of possible falsely detected boundaries. segments residing in the graph, is a major factor that highly With the proposed insertion error eliminations, the best phonetic affects the recognition accuracy since segmentation errors are recognition accuracy achieved shows a 13.66% error reduction.propagated to the recognition process. In many languages, probabilistic segmentations that construct segment graphs I. INTRODUCTION from the result of first-pass frame-based phonetic recognition Segment-based speech recogitio 1results have been proven to yield good performances. For Segoment-base speech recognition [1] is a pach to sth Thai, segment graphs for such highly accurate segmentation aurtomatic speehrectio pro blemwereah aor ustic algorithms are still prone to errors. This is partially due to the speech signal according to a hypothesized underlying speech lack of speech resources that can be utilazed to trahn acoustic unit, called "Segment" rather than from a fixed-length frame horyl in spe reogiti onesearches.uWith well-tunger as in a more widely-adopted frame-based approach, such as hMs, a pnetic recognition accuracy of approximately the Hidden Markov Model (HMM) -based speech recognition. HoThis technique has many advantages over the frame-based only 500o was achieved when clean speech utterances in the approach. For example, the segment-based approach makes training set of LOTUS corpus [4], the only publicly available fewer conditional independent assumptions between large-vocabulary Thai speech corpus, were used to train the observations, it can be easily designed to support the use of acoustic models and a bigram language model was also used heterogeneous feature vectors and classifiers [2], and it is to constrain the search..setobinertdwtspeech-specific knowledge such This paper aims at improving the quality of the segment easierto be inerae withgraph obtained from a typical HMM-based phonetic as phonetic boundaries -one of important cues for phonetic t . a Jl . phonetI contrasts. In English, MIT's SUMMIT [1], a segment-based recognition by adjusting segment availability in the graphs so speech recognition system has shown to be successful in tha...
Abstract. Segment-based speech recognition has shown to be a competitive alternative to the state-of-theart HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segmentbased segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition.
Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limitedvocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.