2018 14th IEEE International Conference on Signal Processing (ICSP) 2018
DOI: 10.1109/icsp.2018.8652462
|View full text |Cite
|
Sign up to set email alerts
|

Pitch Range Estimation with Multi features and MTL-DNN Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…Moreover, training multiple related tasks jointly in one model is more efficient than training them in isolation. The MTL method has been commonly used in speech technologies [53,54]. In this refined model, we adopted the MTL-LSTM, of which the structure is shown in Figure 2, to estimate the three parameters (i.e., mean, ceiling, and floor) for F0 range jointly.…”
Section: Refined Model Setupmentioning
confidence: 99%
“…Moreover, training multiple related tasks jointly in one model is more efficient than training them in isolation. The MTL method has been commonly used in speech technologies [53,54]. In this refined model, we adopted the MTL-LSTM, of which the structure is shown in Figure 2, to estimate the three parameters (i.e., mean, ceiling, and floor) for F0 range jointly.…”
Section: Refined Model Setupmentioning
confidence: 99%
“…Inspired by these findings, some studies have proposed to estimate pitch range from spectral information with deep learning model recently. 5,6 For example, Zhang et al 6 have utilized multi-task learning deep neural networks with multi-feature input to estimate pitch-range targets. Different from Ref.…”
Section: Introductionmentioning
confidence: 99%
“…Different from Ref. 6, Zhang et al 5 proposed to deploy LSTM model with only one feature as input, whose experimental results showed that they could still achieve a reliable pitch-level estimation result with low (<2.5%) mean absolute percentage error (MAPE) rate when the speech segments are as brief as 300 ms (about 1-1.5 syllables). Both studies have demonstrated that one can estimate pitch range directly from the spectral structure by means of deep learning methods.…”
Section: Introductionmentioning
confidence: 99%