Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

Yang, Longfei; Xie, Yanlu; Zhang, Jinsong

doi:10.21437/interspeech.2018-2561

Cited by 6 publications

(7 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Frame-based approach feeds a sequence frames directly to the classifier, which outputs a sequence of labels. To capture context information, RNN [3,13] and CNN [3] are frequently used. Also, frame-based frameworks often use techniques such as pooling [12], attention [3], or Connectionist Temporal Classification (CTC) [13] to perform frame-level alignment to correctly output a series of tone labels.…”

Section: Related Workmentioning

confidence: 99%

“…To capture context information, RNN [3,13] and CNN [3] are frequently used. Also, frame-based frameworks often use techniques such as pooling [12], attention [3], or Connectionist Temporal Classification (CTC) [13] to perform frame-level alignment to correctly output a series of tone labels. This approach allows a single training pass to cover both the alignment and classification task, and does not require a pretrained ASR model.…”

Section: Related Workmentioning

confidence: 99%

“…T1 to T4 refer to the lexical tones. T1 is a flat tone with a relatively high frequency, T2 has a rising frequency, T3 first dips and then rises, and T4's frequency is falling [1,2,3,4,5]. In addition to these four tones, there's also a neutral tone, which itself doesn't possess a particular pitch contour pattern.…”

Section: Introduction 1backgroundmentioning

confidence: 99%

“…Thus, the pitch contours of a syllable can be affected by the context bi-directionally [5]. This phenomenon is called tonal coarticulation [3,5,6,8,9].…”

Section: Introduction 1backgroundmentioning

confidence: 99%

See 3 more Smart Citations

End-to-End Mandarin Tone Classification with Short Term Context Information

Tang¹,

Li²

2021

Preprint

View full text Add to dashboard Cite

In this paper, we propose an end-to-end Mandarin tone classification method from continuous speech utterances utilizing both the spectrogram and the short term context information as the inputs. Both Mel-spectrograms and context segment features are used to train the tone classifier. We first divide the spectrogram frames into syllable segments using force alignment results produced by an ASR model. Then we extract the short term segment features to capture the context information across multiple syllables. Feeding both the Mel-spectrogram and the short term context segment features into an end-to-end model could significantly improve the performance. Experiments are performed on a large scale open source Mandarin speech dataset to evaluate the proposed method. Results show that the this method improves the classification accuracy from 79.5% to 88.7% on the AISHELL3 database.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introduction 1backgroundmentioning

confidence: 99%

“…Thus, the pitch contours of a syllable can be affected by the context bi-directionally [5]. This phenomenon is called tonal coarticulation [3,5,6,8,9].…”

Section: Introduction 1backgroundmentioning

confidence: 99%

See 2 more Smart Citations

End-to-End Mandarin Tone Classification with Short Term Context Information

Tang¹,

Li²

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Consequently, the design and implementation of Multi-class Support Vector Machine in the recognition of SY context dependent tone is presented in this paper to engender and provide arguments for the use of context dependent tone segment for SY ASR. In language such as SY, tones are associated with syllable (Yang and Zhang, 2018). SY has seven possible syllable structures, these include consonant -vowel , , digraph-vowel nasal , digraph-vowel , vowel , vowel nasal and syllabic nasal .…”

mentioning

confidence: 99%

Standard Yorùbá context dependent tone identification using Multi-Class Support Vector Machine (MSVM)

Sosimi¹,

Adegbola²,

Fakinlede³

2019

Journal of Applied Sciences and Environmental Management

View full text Add to dashboard Cite

Most state-of-the-art large vocabulary continuous speech recognition systems employ context dependent (CD) phone units, however, the CD phone units are not efficient in capturing long-term spectral dependencies of tone in most tone languages. The Standard Yorùbá (SY) is a language composed of syllable with tones and requires different method for the acoustic modeling. In this paper, a context dependent tone acoustic model was developed. Tone unit is assumed as syllables, amplitude magnified difference function (AMDF) was used to derive the utterance wide F contour, followed by automatic syllabification and tri-syllable forced alignment with speech phonetization alignment and syllabification SPPAS tool. For classification of the context dependent (CD) tone, slope and intercept of F values were extracted from each segmented unit. Supervised clustering scheme was utilized to partition CD tri-tone based on category and normalized based on some statistics to derive the acoustic feature vectors. Multi-class support vector machine (MSVM) was used for tri-tone training. From the experimental results, it was observed that the word recognition accuracy obtained from the MSVM tri-tone system based on dynamic programming tone embedded features was comparable with phone features. A best parameter tuning was obtained for 10-fold cross validation and overall accuracy was 97.5678%. In term of word error rate (WER), the MSVM CD tri-tone system outperforms the hidden Markov model tri-phone system with WER of 44.47%.

show abstract

TIA: A Teaching Intonation Assessment Dataset in Real Teaching Situations

Liu,

Zhang,

et al. 2024

ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

Cited by 6 publications

References 11 publications

End-to-End Mandarin Tone Classification with Short Term Context Information

End-to-End Mandarin Tone Classification with Short Term Context Information

Standard Yorùbá context dependent tone identification using Multi-Class Support Vector Machine (MSVM)

TIA: A Teaching Intonation Assessment Dataset in Real Teaching Situations

Contact Info

Product

Resources

About