Interspeech 2018 2018
DOI: 10.21437/interspeech.2018-2561
|View full text |Cite
|
Sign up to set email alerts
|

Improving Mandarin Tone Recognition Using Convolutional Bidirectional Long Short-Term Memory with Attention

Abstract: Automatic tone recognition is useful for Mandarin spoken language processing. However, the complex F0 variations from the tone co-articulations and the interplay effects among tonality make it rather difficult to perform tone recognition of Chinese continuous speech. This paper explored the application of Bidirectional Long Short-Term Memory (BLSTM), which had the capability of modeling time series, to Mandarin tone recognition to handle the tone variations in continuous speech. In addition, we introduced atte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2

Citation Types

1
6
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 11 publications
1
6
0
Order By: Relevance
“…Frame-based approach feeds a sequence frames directly to the classifier, which outputs a sequence of labels. To capture context information, RNN [3,13] and CNN [3] are frequently used. Also, frame-based frameworks often use techniques such as pooling [12], attention [3], or Connectionist Temporal Classification (CTC) [13] to perform frame-level alignment to correctly output a series of tone labels.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Frame-based approach feeds a sequence frames directly to the classifier, which outputs a sequence of labels. To capture context information, RNN [3,13] and CNN [3] are frequently used. Also, frame-based frameworks often use techniques such as pooling [12], attention [3], or Connectionist Temporal Classification (CTC) [13] to perform frame-level alignment to correctly output a series of tone labels.…”
Section: Related Workmentioning
confidence: 99%
“…To capture context information, RNN [3,13] and CNN [3] are frequently used. Also, frame-based frameworks often use techniques such as pooling [12], attention [3], or Connectionist Temporal Classification (CTC) [13] to perform frame-level alignment to correctly output a series of tone labels. This approach allows a single training pass to cover both the alignment and classification task, and does not require a pretrained ASR model.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Consequently, the design and implementation of Multi-class Support Vector Machine in the recognition of SY context dependent tone is presented in this paper to engender and provide arguments for the use of context dependent tone segment for SY ASR. In language such as SY, tones are associated with syllable (Yang and Zhang, 2018). SY has seven possible syllable structures, these include consonant -vowel , , digraph-vowel nasal , digraph-vowel , vowel , vowel nasal and syllabic nasal .…”
mentioning
confidence: 99%