Jiahong Yuan scite author profile

This paper reports the results of our experiments on speaker identification in the SCOTUS corpus, which includes oral arguments from the Supreme Court of the United States. Our main findings are as follows: 1) a combination of Gaussian mixture models and monophone HMM models attains near-100% textindependent identification accuracy on utterances that are longer than one second; 2) the sampling rate of 11025 Hz achieves the best performance (higher sampling rates are harmful); and a sampling rate as low as 2000 Hz still achieves more than 90% accuracy; 3) a distance score based on likelihood numbers was used to measure the variability of phones among speakers; we found that the most variable phone is the phone UH (as in good), and the velar nasal NG is more variable than the other two nasal sounds M and N; 4.) our models achieved "perfect" forced alignment on very long speech segments (40 minutes). These findings and their significance are discussed.

show abstract

Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer’s Disease

Yuan

Bian

Cai

et al. 2020

View full text Add to dashboard Cite

Perception of intonation in Mandarin Chinese

Yuan

2011

View full text Add to dashboard Cite

There is a tendency across languages to use a rising pitch contour to convey question intonation and a falling pitch contour to convey a statement. In a lexical tone language such as Mandarin Chinese, rising and falling pitch contours are also used to differentiate lexical meaning. How, then, does the multiplexing of the F(0) channel affect the perception of question and statement intonation in a lexical tone language? This study investigated the effects of lexical tones and focus on the perception of intonation in Mandarin Chinese. The results show that lexical tones and focus impact the perception of sentence intonation. Question intonation was easier for native speakers to identify on a sentence with a final falling tone and more difficult to identify on a sentence with a final rising tone, suggesting that tone identification intervenes in the mapping of F(0) contours to intonational categories and that tone and intonation interact at the phonological level. In contrast, there is no evidence that the interaction between focus and intonation goes beyond the psychoacoustic level. The results provide insights that will be useful for further research on tone and intonation interactions in both acoustic modeling studies and neurobiological studies.

show abstract

F0 declination in English and Mandarin Broadcast News Speech

Yuan

Liberman

2014

Speech Communication

View full text Add to dashboard Cite

This study investigates F 0 declination in broadcast news speech in English and Mandarin Chinese. The results demonstrate a strong relationship between utterance length and declination slope. Shorter utterances have steeper declination even after excluding the initial rising and final lowering effects. Both topline and baseline show declination, but they are independent. The topline and baseline have different patterns in Mandarin Chinese, whereas in English their patterns are similar. Mandarin Chinese has more and steeper declination than English, as well as wider pitch range and more F 0 fluctuations.

show abstract

Mechanisms of Question Intonation in Mandarin

Yuan

2006

View full text Add to dashboard Cite

Abstract. This study investigates mechanisms of question intonation in Mandarin Chinese. Three mechanisms of question intonation have been proposed: an overall higher phrase curve, higher strengths of sentence final tones, and a tone-dependent mechanism that flattens the falling slope of the final falling tone and steepens the rising slope of the final rising tone. The phrase curve and strength mechanisms were revealed by a computational modeling study and verified by the acoustic analyses as well as the perception experiments. The tone-dependent mechanism was suggested by a result from the perceptual study: question intonation is easier to identify if the sentencefinal tone is falling whereas it is harder to identify if the sentence-final tone is rising, and was revealed by the acoustic analyses on the final Tone2 and Tone4.

show abstract

On the Role of Style in Parsing Speech with Neural Models

Tran

Yuan

Liu

et al. 2019

View full text Add to dashboard Cite

The differences in written text and conversational speech are substantial; previous parsers trained on treebanked text have given very poor results on spontaneous speech. For spoken language, the mismatch in style also extends to prosodic cues, though it is less well understood. This paper reexamines the use of written text in parsing speech in the context of recent advances in neural language processing. We show that neural approaches facilitate using written text to improve parsing of spontaneous speech, and that prosody further improves over this state-of-the-art result. Further, we find an asymmetric degradation from read vs. spontaneous mismatch, with spontaneous speech more generally useful for training parsers.

show abstract

Automatic detection of “g-dropping” in American English using forced alignment

Yuan

Liberman

2011

View full text Add to dashboard Cite

This study investigated the use of forced alignment for automatic detection of "g-dropping" in American English (e.g., walkin'). Two acoustic models were trained, one for -in' and the other for -ing. The models were added to the Penn Phonetics Lab Forced Aligner, and forced alignment will choose the more probable pronunciation from the two alternatives. The agreement rates between the forced alignment method and native English speakers ranged from 79% to 90%, which were comparable to the agreement rates among the native speakers (79% -96%). The two variations of pronunciation not only differed in their nasal codas, but also -and even more so -in their vowel quality. This is shown by both the KL-divergence between the two models, and that native Mandarin speakers performed poorly on classification of "g-dropping". .

show abstract

Highly accurate phonetic segmentation using boundary correction models and system fusion

Stolcke

Ryant

Mitra

et al. 2014

View full text Add to dashboard Cite

Accurate phone-level segmentation of speech remains an important task for many subfields of speech research. We investigate techniques for boosting the accuracy of automatic phonetic segmentation based on HMM acoustic-phonetic models. In prior work [25] we were able to improve on state-of-the-art alignment accuracy by employing special phone boundary HMM models, trained on phonetically segmented training data, in conjunction with a simple boundary-time correction model. Here we present further improved results by using more powerful statistical models for boundary correction that are conditioned on phonetic context and duration features. Furthermore, we find that combining multiple acoustic front-ends gives additional gains in accuracy, and that conditioning the combiner on phonetic context and side information helps. Overall, we reduce segmentation errors on the TIMIT corpus by almost one half, from 93.9% to 96.8% boundary accuracy with a 20-ms tolerance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jiahong Yuan

Speaker identification on the SCOTUS corpus

Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer’s Disease

Perception of intonation in Mandarin Chinese

F0 declination in English and Mandarin Broadcast News Speech

Mechanisms of Question Intonation in Mandarin

On the Role of Style in Parsing Speech with Neural Models

Automatic detection of “g-dropping” in American English using forced alignment

Highly accurate phonetic segmentation using boundary correction models and system fusion

Contact Info

Product

Resources

About

Jiahong Yuan

Speaker identification on the SCOTUS corpus

Disfluencies and Fine-Tuning Pre-Trained Language Models for Detection of Alzheimer’s Disease

Perception of intonation in Mandarin Chinese

F0 declination in English and Mandarin Broadcast News Speech

Mechanisms of Question Intonation in Mandarin

On the Role of Style in Parsing Speech with Neural Models

Automatic detection of &#x201C;g-dropping&#x201D; in American English using forced alignment

Highly accurate phonetic segmentation using boundary correction models and system fusion

Contact Info

Product

Resources

About

Automatic detection of “g-dropping” in American English using forced alignment