2013
DOI: 10.1109/tasl.2013.2277941
|View full text |Cite
|
Sign up to set email alerts
|

Source/Filter Factorial Hidden Markov Model, With Application to Pitch and Formant Tracking

Abstract: Tracking vocal tract formant frequencies (f p ) and estimating the fundamental frequency (f 0 ) are two tracking problems that have been tackled in many speech processing works, often independently, with applications to articulatory parameters estimations, speech analysis/synthesis or linguistics. Many works assume an auto-regressive (AR) model to fit the spectral envelope, hence indirectly estimating the formant tracks from the AR parameters. However, directly estimating the formant frequencies, or equivalent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
7
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 29 publications
0
7
0
Order By: Relevance
“…The estimation and tracking of VTRs from speech signals is a challenging problem that has many applications in various areas: in acoustic and phonetic analysis [1], [2], in voice morphing [3], in speech recognition [4], [5], in speech and singing voice synthesis [6], [7], in voice activity detection [8], and in designing hearing aids [9], [10]. Many algorithms of varying complexity have been proposed in the literature for tracking formants in speech signals [11]- [15]. A dynamic programming (DP)-based tracking algorithm with a heuristic cost function on the initial formant candidates estimated using conventional linear prediction (LP) analysis was used in [11], [12].…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…The estimation and tracking of VTRs from speech signals is a challenging problem that has many applications in various areas: in acoustic and phonetic analysis [1], [2], in voice morphing [3], in speech recognition [4], [5], in speech and singing voice synthesis [6], [7], in voice activity detection [8], and in designing hearing aids [9], [10]. Many algorithms of varying complexity have been proposed in the literature for tracking formants in speech signals [11]- [15]. A dynamic programming (DP)-based tracking algorithm with a heuristic cost function on the initial formant candidates estimated using conventional linear prediction (LP) analysis was used in [11], [12].…”
Section: Introductionmentioning
confidence: 99%
“…This two-stage approach has a detection stage, where an initial estimate of the VTRs is obtained, followed by a tracking stage. An integrated approach towards tracking was adopted in [13]- [15] using state-space methods such as Kalman filtering (KF) and the factorial hidden Markov model (FHMM). In both approaches, analysis of the signal for the accurate estimation (or modeling) of the vocal tract system is an important and necessary computational block.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Accurate tracking of formants in speech signals has potential applications in acoustic-phonetic analysis of speech signals, speech enhancement, formant-based speech synthesis, pronunciation correction [1][2][3][4][5]. Many algorithms of varying complexity have been proposed in the literature for tracking formants in speech signals [6][7][8][9][10]. A dynamic programming (DP) based tracking with a heuristic cost function on the initial formant candidates estimated using a conventional LP analysis is used in [6,7].…”
Section: Introductionmentioning
confidence: 99%
“…A dynamic programming (DP) based tracking with a heuristic cost function on the initial formant candidates estimated using a conventional LP analysis is used in [6,7]. An integrated approach towards tracking is adopted in [8][9][10] using state-space methods such as Kalman filtering (KF) and factorial hidden Markov model (FHMM). Most of these algorithms use an underlying linear prediction (LP) based modeling of speech signals, except in [10] which uses a nonnegative matrix factorization (NMF) based source-filter modeling of speech signals.…”
Section: Introductionmentioning
confidence: 99%