Pitch Range Estimation with Multi features and MTL-DNN Model

Zhang, Qi; Cao, Chong; Li, Tiantian; Xie, Yanlu; Zhang, Jinsong

doi:10.1109/icsp.2018.8652462

Cited by 3 publications

(3 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Moreover, training multiple related tasks jointly in one model is more efficient than training them in isolation. The MTL method has been commonly used in speech technologies [53,54]. In this refined model, we adopted the MTL-LSTM, of which the structure is shown in Figure 2, to estimate the three parameters (i.e., mean, ceiling, and floor) for F0 range jointly.…”

Section: Refined Model Setupmentioning

confidence: 99%

Estimation of the Underlying F0 Range of a Speaker from the Spectral Features of a Brief Speech Input

et al. 2022

Self Cite

View full text Add to dashboard Cite

From a very brief speech, human listeners can estimate the pitch range of the speaker and normalize pitch perception. Spectral features which inherently involve both articulatory and phonatory characteristics were speculated to play roles in this process, but few were reported to directly correlate with speaker’s F0 range. To mimic this human auditory capability and validate the speculation, in a preliminary study we proposed an LSTM-based method to estimate speaker’s F0 range from a 300 ms-long speech input, which turned out to outperform the conventional method. By two more experiments, this study further improved the method and verified its validity in estimating the speaker-specific underlying F0 range. After incorporating a novel measurement of F0 range and a multi-task training approach, Experiment 1 showed that the refined model gave more accurate estimates than the initial model. Based on a Japanese-Chinese bilingual parallel speech corpus, Experiment 2 found that the F0 ranges estimated with the model from the Chinese speech and the model from the Japanese speech produced by the same set of speakers had no significant difference, whereas the conventional method showed significant difference. The results indicate that the proposed spectrum-based method captures the speaker-specific underlying F0 range which is independent of the linguistic content.

show abstract

Section: Refined Model Setupmentioning

confidence: 99%

Estimation of the Underlying F0 Range of a Speaker from the Spectral Features of a Brief Speech Input

et al. 2022

Self Cite

View full text Add to dashboard Cite

show abstract

“…Inspired by these findings, some studies have proposed to estimate pitch range from spectral information with deep learning model recently. 5,6 For example, Zhang et al 6 have utilized multi-task learning deep neural networks with multi-feature input to estimate pitch-range targets. Different from Ref.…”

Section: Introductionmentioning

confidence: 99%

“…Different from Ref. 6, Zhang et al 5 proposed to deploy LSTM model with only one feature as input, whose experimental results showed that they could still achieve a reliable pitch-level estimation result with low (<2.5%) mean absolute percentage error (MAPE) rate when the speech segments are as brief as 300 ms (about 1-1.5 syllables). Both studies have demonstrated that one can estimate pitch range directly from the spectral structure by means of deep learning methods.…”

Section: Introductionmentioning

confidence: 99%

A Study on the Robustness of Pitch-Range Estimation from Brief Speech Segments

Peng

Wei

et al. 2020

Int. J. As. Lang. Proc.

Self Cite

View full text Add to dashboard Cite

Pitch-range estimation from brief speech segments could bring benefits to many tasks like automatic speech recognition and speaker recognition. To estimate pitch range, previous studies have proposed to utilize deep-learning-based models with spectrum information as input. They demonstrated that such method works and could still achieve reliable estimation results when the speech segment is as brief as 300 ms. In this study, we evaluated the robustness of this method. We take the following scenarios into account: (1) a large number of training speakers; (2) different language backgrounds; and (3) monosyllabic utterances with different tones. Experimental results showed that: (1) The use of a large number of training speakers improved the estimation accuracies. (2) The mean absolute percentage error (MAPE) rate evaluated on the L2 speakers is similar to that on the native speakers. (3) Different tonal information will affect the LSTM-based model, but this influence is limited compared to the baseline method which calculates pitch-range targets from the distribution of [Formula: see text]0 values. These experimental results verified the efficiency of the LSTM-based pitch-range estimation method.

show abstract

A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments

Peng

Wei

et al. 2019

2019 International Conference on Asian Language Processing (IALP)

View full text Add to dashboard Cite

Pitch Range Estimation with Multi features and MTL-DNN Model

Cited by 3 publications

References 5 publications

Estimation of the Underlying F0 Range of a Speaker from the Spectral Features of a Brief Speech Input

Estimation of the Underlying F0 Range of a Speaker from the Spectral Features of a Brief Speech Input

A Study on the Robustness of Pitch-Range Estimation from Brief Speech Segments

A Study on the Robustness of Pitch Range Estimation from Brief Speech Segments

Contact Info

Product

Resources

About