This paper presents a bilingual acoustic modeling approach for transcribing Mandarin-English code-mixed lectures with highly unbalanced language distribution. Special terminologies for the content were produced in the guest language of English (about 15%) and embedded in the utterances produced in the host lan guage of Mandarin (about 85%). The code-mixing nature of the target corpus and the very small percentage of the English data made the task difficult. State mapping and merging approaches plus three stages of model adaptation handles the above problem. Significant improvements in recognition accuracy were obtained in the experiment with a real bilingual code-mixed lecture corpus recorded at National Taiwan University. The code-mixing situation considered is actually very natural in the spoken language of the daily lives of many people in the globalized world today.
We propose a method to enhance multi-stream Gabor and MFCC features using data-driven hierarchical phoneme clusters to yield more discriminating posteriors. We take into account different hierarchy structures, and in addition perform mean and variance normalization. A relative improvement of 11.5% over the conven tional MFCC Tandem system was achieved in experiments con ducted on Mandarin broadcast news. We analyze the complemen tarity between Gabor and MFCC features for different types of phonemes, and investigate the benefits that come from using hie rarchical phoneme clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.