Krerksak Likitsupin scite author profile

Krerksak Likitsupin

4Publications

10Citation Statements Received

41Citation Statements Given

How they've been cited

How they cite others

Affiliations

Chulalongkorn University

Publications

Order By: Most citations

Improving Segment-based Speech Recognition by Recovering Missing Segments in Segment Graphs ¿ A Thai Case Study

Likitsupin

Suchato

Punyabukkana

et al. 2008

View full text Add to dashboard Cite

In segment-based speech recognition systems, the could be implemented using various methods including using quality of the segmentation step is a major factor highly affecting dynamic programming techniques to search the composed their accuracies. This paper proposes methods to reduce missing weighted finite state transducer between the segment graph segments caused by boundary insertion errors in segment graphs, and a pronunciation graph derived from the grammar of the which, in the case of Thai, could be generated from a -;. probabilistic segmentation with limited speech resources. recogiion tas teest.Acoustic discontinuities and manners of articulation are used to It is obvious that the quality of the segment graph, which verify boundaries of the segment graph. Segments are added to could be judged based upon how many correctly hypothesized the graph in the case of possible falsely detected boundaries. segments residing in the graph, is a major factor that highly With the proposed insertion error eliminations, the best phonetic affects the recognition accuracy since segmentation errors are recognition accuracy achieved shows a 13.66% error reduction.propagated to the recognition process. In many languages, probabilistic segmentations that construct segment graphs I. INTRODUCTION from the result of first-pass frame-based phonetic recognition Segment-based speech recogitio 1results have been proven to yield good performances. For Segoment-base speech recognition [1] is a pach to sth Thai, segment graphs for such highly accurate segmentation aurtomatic speehrectio pro blemwereah aor ustic algorithms are still prone to errors. This is partially due to the speech signal according to a hypothesized underlying speech lack of speech resources that can be utilazed to trahn acoustic unit, called "Segment" rather than from a fixed-length frame horyl in spe reogiti onesearches.uWith well-tunger as in a more widely-adopted frame-based approach, such as hMs, a pnetic recognition accuracy of approximately the Hidden Markov Model (HMM) -based speech recognition. HoThis technique has many advantages over the frame-based only 500o was achieved when clean speech utterances in the approach. For example, the segment-based approach makes training set of LOTUS corpus [4], the only publicly available fewer conditional independent assumptions between large-vocabulary Thai speech corpus, were used to train the observations, it can be easily designed to support the use of acoustic models and a bigram language model was also used heterogeneous feature vectors and classifiers [2], and it is to constrain the search..setobinertdwtspeech-specific knowledge such This paper aims at improving the quality of the segment easierto be inerae withgraph obtained from a typical HMM-based phonetic as phonetic boundaries -one of important cues for phonetic t . a Jl . phonetI contrasts. In English, MIT's SUMMIT [1], a segment-based recognition by adjusting segment availability in the graphs so speech recognition system has shown to be successful in tha...

show abstract

Acoustic-Phonetic Approaches for Improving Segment-Based Speech Recognition for Large Vocabulary Continuous Speech

Likitsupin

Punyabukkana

Wutiwiwatchai

et al. 2016

View full text Add to dashboard Cite

Abstract. Segment-based speech recognition has shown to be a competitive alternative to the state-of-theart HMM-based techniques. Its accuracies rely heavily on the quality of the segment graph from which the recognizer searches for the most likely recognition hypotheses. In order to increase the inclusion rate of actual segments in the graph, it is important to recover possible missing segments generated by segmentbased segmentation algorithm. An aspect of this research focuses on determining the missing segments due to missed detection of segment boundaries. The acoustic discontinuities, together with manner-distinctive features are utilized to recover the missing segments. Another aspect of improvement to our segment-based framework tackles the restriction of having limited amount of training speech data which prevents the usage of more complex covariance matrices for the acoustic models. Feature dimensional reduction in the form of the Principal Component Analysis (PCA) is applied to enable the training of full covariance matrices and it results in improved segment-based phoneme recognition. Furthermore, to benefit from the fact that segment-based approach allows the integration of phonetic knowledge, we incorporate the probability of each segment being one type of sound unit of a certain specific common manner of articulation into the scoring of the segment graphs. Our experiment shows that, with the proposed improvements, our segment-based framework approximately increases the phoneme recognition accuracy by approximately 25% of the one obtained from the baseline segment-based speech recognition.

show abstract

The CU-MFEC corpus for Thai and english spelling speech recognition

Kertkeidkachorn

Chanjaradwichai

Suri

et al. 2012

View full text Add to dashboard Cite

Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus

Suchato

Chanjaradwichai

Kertkeidkachorn

et al. 2012

View full text Add to dashboard Cite

Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limitedvocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.