Approximate Kalman filtering for the harmonic plus noise model

Parra, Lucas C.; Jain, Udit

doi:10.1109/aspaa.2001.969546

Cited by 9 publications

(6 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Their method outperforms several standard pitch tracking algorithms for speech, suggesting potential practical benefits of an approximate Bayesian treatment. For monophonic speech, a Kalman filter based pitch tracker is proposed by [16] that tracks parameters of a harmonic plus noise model (HNM). They propose the use of Laplace approximation around the predicted mean instead of the extended Kalman filter (EKF).…”

Section: A Music Transcriptionmentioning

confidence: 99%

See 1 more Smart Citation

A generative model for music transcription

Cemgil

Kappen

Barber

2006

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-In this paper we present a graphical model for polyphonic music transcription. Our model, formulated as a Dynamical Bayesian Network, embodies a transparent and computationally tractable approach to this acoustic analysis problem. An advantage of our approach is that it places emphasis on explicitly modelling the sound generation procedure. It provides a clear framework in which both high level (cognitive) prior information on music structure can be coupled with low level (acoustic physical) information in a principled manner to perform the analysis. The model is a special case of the, generally intractable, switching Kalman filter model. Where possible, we derive, exact polynomial time inference procedures, and otherwise efficient approximations. We argue that our generative model based approach is computationally feasible for many music applications and is readily extensible to more general auditory scene analysis scenarios.

show abstract

Section: A Music Transcriptionmentioning

confidence: 99%

“…The sinusoidal model [30] is often a good approximation that provides a compact representation for the periodic component. The transient component can be modelled as a correlated Gaussian noise process [16], [20]. Our signal model is also in the same spirit, but we will define it in state Fig.…”

Section: A Modelling a Single Notementioning

confidence: 99%

A generative model for music transcription

Cemgil

Kappen

Barber

2006

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…However, the pitch values in a sequence are usually highly correlated, This work was funded by the Villum Foundation 1 , the Cluster of Excellence 1077 "Hearing4all" by the German Research Foundation (DFG) 2 , and the Danish Council for Independent Research, grant ID: DFF 1337-00084 3 . which motivates the development of the Bayesian methods to optimally use the correlations. The Bayesian methods incorporate prior distributions, and can be used to derive the minimum mean square error (MMSE) estimator and the maximum a posteriori (MAP) estimator [6], e.g., [7].…”

Section: Introductionmentioning

confidence: 99%

Pitch estimation and tracking with harmonic emphasis on the acoustic spectrum

Karimian-Azari

Mohammadiha

Jensen

et al. 2015

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

In this paper, we use unconstrained frequency estimates (UFEs) from a noisy harmonic signal and propose two methods to estimate and track the pitch over time. We assume that the UFEs are multivariate-normally-distributed random variables, and derive a maximum likelihood (ML) pitch estimator by maximizing the likelihood of the UFEs over short timeintervals. As the main contribution of this paper, we propose two state-space representations to model the pitch continuity, and, accordingly, we propose two Bayesian methods, namely a hidden Markov model and a Kalman filter. These methods are designed to optimally use the correlations in the consecutive pitch values, where the past pitch estimates are used to recursively update the prior distribution for the pitch variable. We perform experiments using synthetic data as well as a noisy speech recording, and show that the Bayesian methods provide more accurate estimates than the corresponding ML methods.

show abstract

“…denote the eigenvalues of the matrix , in which is the sample covariance matrix and is an matrix whose columns are orthogonal to such that defines a complete orthonormal basis, which satisfies (6) In the Appendix, it is shown that in a single snapshot case, i.e., , the likelihood function is given by (7) where is the measurement vector in the single snapshot case. The model under hypothesis , presented in (2), is equivalent to (4), with a single snapshot, i.e., .…”

Section: A : Harmonic Noisementioning

confidence: 99%

“…In the single snapshot case where , the sample covariance matrix is given by and the matrix can be rewritten as (24) Since the matrix is of rank one, then its eigenvalues are equal to zero except the first one, , given by (25) According to (6), , and thus,…”

Section: Appendix Derivation Of the Likelihood Function In (7) For Thmentioning

confidence: 99%

Generalized likelihood ratio test for voiced/unvoiced decision using the harmonic plus noise model

Fisher

Tabrikian

Dubnov

2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).

View full text Add to dashboard Cite

Abstract-In this paper, a novel method for voiced-unvoiced decision within a pitch tracking algorithm is presented. Voicedunvoiced decision is required for many applications, including modeling for analysis/synthesis, detection of model changes for segmentation purposes and signal characterization for indexing and recognition applications. The proposed method is based on the generalized likelihood ratio test (GLRT) and assumes colored Gaussian noise with unknown covariance. Under voiced hypothesis, a harmonic plus noise model is assumed. The derived method is combined with a maximum a-posteriori probability (MAP) scheme to obtain a pitch and voicing tracking algorithm. The performance of the proposed method is tested using several speech databases for different levels of additive noise and phone speech conditions. Results show that the GLRT is robust to speaker and environmental conditions and performs better than existing algorithms. Index Terms-Generalized likelihood ratio test (GLRT), harmonic model, likelihood ratio test (LRT), maximum a-posteriori probability, noisy speech, pitch tracking, voice activity detection (VAD), voiced-unvoiced decision.

show abstract

Approximate Kalman filtering for the harmonic plus noise model

Cited by 9 publications

References 8 publications

A generative model for music transcription

A generative model for music transcription

Pitch estimation and tracking with harmonic emphasis on the acoustic spectrum

Generalized likelihood ratio test for voiced/unvoiced decision using the harmonic plus noise model

Contact Info

Product

Resources

About