Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Lin, Ju; Li, Wei; Gao, Yingming; Xie, Yanlu; Chen, Nancy F.; Siniscalchi, Sabato Marco; Zhang, Jinsong; Lee, Chin‐Hui

doi:10.1007/s11265-018-1334-2

Cited by 18 publications

(7 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The experimental results have shown that at full data resolution, F0 contours both in Hertz and in semitones can achieve high tone recognition rates (86% and 97%, respectively). Although similar recognition rates were already shown in a previous study using the same corpus [29], the performance is not trivial, as these tones were produced in fluent connected speech in many different tonal contexts and two syllable positions [28], yet in the present study no contextual or positional information is used as input features during training and testing, contrary to the common practice in speech technology applications [30], [31] and [32]. This means that, despite the variability, tones produced in contexts by speakers of both genders still have enough in common to allow a pattern recognition algorithm (SVM) to accurately recognize the tonal categories.…”

Section: Discussionsupporting

confidence: 63%

Intermediate features are not useful for tone perception

Chen¹,

Xu²

2020

Speech Prosody 2020

View full text Add to dashboard Cite

Many theories assume that speech perception is done by first extracting features like the distinctive features, tonal features or articulatory gestures before recognizing phonetic units such as segments and tones. But it is unclear how exactly extracted features can lead to effective phonetic recognition. In this study we explore this issue by using support vector machine (SVM), a supervised machine learning model, to simulate the recognition of Mandarin tones from F0 in continuous speech. We tested how well a five-level system or a binary distinctive features system can identify Mandarin tones by training the SVM model with F0 trajectories with reduced temporal and frequency resolutions. At full resolution, the recognition rates were 97% and 86% based on the semitone and Hertz scales, respectively. At reduced temporal resolution, there was no clear decline in recognition rate until two points per syllable. At reduced frequency resolution, the recognition rate dropped rapidly: by the level with 5 bands, the accuracy was around 40% based on both Hertz and semitone scales. These results suggest that intermediate featural representations provide no benefit for tone recognition, and are unlikely to be critical for tone perception.

show abstract

Section: Discussionsupporting

confidence: 63%

Intermediate features are not useful for tone perception

Chen¹,

Xu²

2020

Speech Prosody 2020

View full text Add to dashboard Cite

show abstract

“…. Previous results showed that this method had many advantages in mispronunciation detection [30], which attested the reliability of the grouping method mentioned above.…”

Section: Datasupporting

confidence: 54%

The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers

Du¹,

Zhang²

2019

Interspeech 2019

Self Cite

View full text Add to dashboard Cite

Previous studies have shown that learners with different native language backgrounds have common difficulties in learning Chinese affricates but demonstrate in various patterns. While few studies investigated this issue of native Urdu speakers. To address the production of Chinese affricates /ts/ and /tsʰ/ by native Urdu speakers, speech materials, produced by two groups of subjects with different Chinese proficiency, were selected from the BLCU-SAIT speech corpus. The error rate and error types of their production of Chinese affricates /ts/ and /tsʰ/ have been discussed after transcription and data analysis. The results show that though there are no counterparts of Chinese affricates /ts/ and /tsʰ/ in Urdu, the error and the acquisition pattern of these two affricates, to some extent, affected by individual differences of their roles in Urdu except universal similarities and differences between two languages. The findings of this study shed some light on second language learning and teaching.

show abstract

“…Motivated by the success of deep learning technology, some deep learning models have been applied to tone recognition. [7,8] applies DNN to tone recognition on female corpus and some good results are achieved. More recently, [9] employs Convolutional Neural Network (CNN) for speech evaluation of the hearing-impaired population.…”

Section: Introductionmentioning

confidence: 99%

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

2018

Self Cite

View full text Add to dashboard Cite

Automatic tone recognition is useful for Mandarin spoken language processing. However, the complex F0 variations from the tone co-articulations and the interplay effects among tonality make it rather difficult to perform tone recognition of Chinese continuous speech. This paper explored the application of Bidirectional Long Short-Term Memory (BLSTM), which had the capability of modeling time series, to Mandarin tone recognition to handle the tone variations in continuous speech. In addition, we introduced attention mechanism to guide the model to select the suitable context information. The experimental results showed that the performance of proposed CNN-BLSTM with attention mechanism was the best and it achieved the tone error rate (TER) of 9.30% with a 17.6% relative error reduction from the DNN baseline system with TER of 11.28%. It demonstrated that our proposed model was more effective to handle the complex F0 variations than other models.

show abstract

Improving Mandarin Tone Recognition Based on DNN by Combining Acoustic and Articulatory Features Using Extended Recognition Networks

Cited by 18 publications

References 33 publications

Intermediate features are not useful for tone perception

Intermediate features are not useful for tone perception

The Production of Chinese Affricates /ts/ and /tsh/ by Native Urdu Speakers

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

Contact Info

Product

Resources

About