2019
DOI: 10.1109/taslp.2019.2945489
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Regression Models for Noise Robust F0 Estimation Using Recurrent Deep Neural Networks

Abstract: The fundamental frequency (F0) in a speech signal, which corresponds to pitch, is one of the key features involved in a variety of speech processing tasks. Therefore, accurate F0 estimation has remained an important problem to be solved over decades. However, this problem is difficult, especially in low signal-to-noise ratio (SNR) conditions with unknown noise. In this work, we propose new approaches to noise-robust F0 estimation using recurrent neural networks (RNNs). Recent F0 estimation studies exploit deep… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(4 citation statements)
references
References 35 publications
0
4
0
Order By: Relevance
“…The experimental results are summarized in Table 1. Although the performance of UDS-based pitch estimation is not as high as the speech-based pitch estimation (typically, GPE rate < 20% for clean speech signal [42]), the superiority of the UDS signal in terms of pitch estimation is clearly found for all metrics. Such results suggest that UDS provides more useful information for pitch estimation.…”
Section: Performance Of Pitch Estimation and V/uv Decisionsmentioning
confidence: 94%
See 1 more Smart Citation
“…The experimental results are summarized in Table 1. Although the performance of UDS-based pitch estimation is not as high as the speech-based pitch estimation (typically, GPE rate < 20% for clean speech signal [42]), the superiority of the UDS signal in terms of pitch estimation is clearly found for all metrics. Such results suggest that UDS provides more useful information for pitch estimation.…”
Section: Performance Of Pitch Estimation and V/uv Decisionsmentioning
confidence: 94%
“…Although the samples for each modality were not recorded simultaneously (because of changes in the shapes of the mouth region by attaching the EMG electrodes), the differences in speech signals among the modalities were minimized by using the common utterance set and asking the subjects to pronounce each word in a consistent manner. The performance of pitch estimation was evaluated using the two standard metrics: the gross pitch error (GPE) rate and the fine pitch error (FPE) [42]. The GPE frames are defined as voiced frames where the error between the estimated pitch period and the ground truth is greater than 0.625 ms.…”
Section: Performance Of Pitch Estimation and V/uv Decisionsmentioning
confidence: 99%
“…Periodicity estimation with statistical pitch estimators has been treated inconsistently in recent literature on neural pitch estimation. Some studies omit the evaluation of periodicity or voicing [25], [27]. Others demonstrate binary voicing classification that-at best-slightly outperforms DSP-based baselines [33], [43].…”
Section: Estimatorsmentioning
confidence: 99%
“…Notable exceptions to the candidate-generation/candidateselection paradigm that do not produce a sequence of scores for subsequent decoding include the self-supervised SPICE [33] and the sinusoidal regression method by Kato et al [27]. While these methods are interesting, they are significantly more complicated than state-of-the-art neural methods trained in supervised classification paradigm, without substantial gains in performance or speed.…”
mentioning
confidence: 99%