2013 IEEE Workshop on Automatic Speech Recognition and Understanding 2013
DOI: 10.1109/asru.2013.6707740
|View full text |Cite
|
Sign up to set email alerts
|

Models of tone for tonal and non-tonal languages

Abstract: Conventional wisdom in automatic speech recognition asserts that pitch information is not helpful in building speech recognizers for non-tonal languages and contributes only modestly to performance in speech recognizers for tonal languages. To maintain consistency between different systems, pitch is therefore often ignored, trading the slight performance benefits for greater system uniformity/ simplicity. In this paper, we report results that challenge this conventional approach. We present new models of tone … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
20
0

Year Published

2013
2013
2017
2017

Publication Types

Select...
3
2
2

Relationship

2
5

Authors

Journals

citations
Cited by 30 publications
(22 citation statements)
references
References 15 publications
2
20
0
Order By: Relevance
“…We could confirm gains as reported in [24] by including tonal features in our architecture. While training new DBNF networks on the augmented features worked best, integrating the tonal features at the bottleneck level or via glue units improved the resulting acoustic model as well.…”
Section: Discussionsupporting
confidence: 83%
See 2 more Smart Citations
“…We could confirm gains as reported in [24] by including tonal features in our architecture. While training new DBNF networks on the augmented features worked best, integrating the tonal features at the bottleneck level or via glue units improved the resulting acoustic model as well.…”
Section: Discussionsupporting
confidence: 83%
“…Here, we investigate how to integrate fundamental frequency variation (FFV) features [23] into multi-lingual architectures. Recent work demonstrated their suitability for automatic speech recognition, especially when used as input features for neural networks, on a larger version of the Vietnamese corpus used here [24].…”
Section: Extension To Language-specific Input Featuresmentioning
confidence: 99%
See 1 more Smart Citation
“…Therefore, the FBANK feature is used in this study for DNN based acoustic modeling. A recent study showed that with DNN acoustic modeling, the tonal feature (fundamental frequency contour and its related features) helps to improve the performance of ASR system even if the language is not a tonal language ( Metze et al, 2013 ). Therefore, the combination of FBANK and tonal features is also used in this study (hereafter described as the FBANK+tonal feature type).…”
Section: Four Types Of Acoustic Featuresmentioning
confidence: 98%
“…In our approach, we are learning frame-level speaker-specific embedding features. In other work [95,156], the acoustic features are also augmented by additional information such as prosodic features and signal-to-noise ratios at the frame-level. Nevertheless, our proposed approach is the first approach that uses manifold information as DNN inputs.…”
Section: Connections To Speaker Adaptationmentioning
confidence: 99%