Narrowband to wideband conversion of speech using GMM based transformation

Park, Kun-Youl; Kim, Hyung Soon

doi:10.1109/icassp.2000.862114

Cited by 94 publications

(5 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The first attempts at music audio bandwidth extension used nonlinear devices [16] and spectral band replication [17]. Other approaches relied on data-driven techniques, such as Gaussian mixture models [18], [19], Hidden Markov Models [20], and shallow [21], [22] and deep neural networks [23]- [25]. Nevertheless, these methods often yielded suboptimal quality due to their limited modeling capabilities.…”

Section: A Audio Bandwidth Extension and Super-resolutionmentioning

confidence: 99%

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

Moliner

Välimäki

2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Audio bandwidth extension aims to expand the spectrum of bandlimited audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes a method for the bandwidth extension of historical music using generative adversarial networks (BEHM-GAN) as a practical solution to this problem. The proposed method works with the complex spectrogram representation of audio and, thanks to a dedicated regularization strategy, can effectively extend the bandwidth of out-of-distribution real historical recordings. The BEHM-GAN is designed to be applied as a second step after denoising the recording to suppress any additive disturbances, such as clicks and background noise. We train and evaluate the method using solo piano classical music. The proposed method outperforms the compared baselines in both objective and subjective experiments. The results of a formal blind listening test show that BEHM-GAN significantly increases the perceptual sound quality in early-20th-century gramophone recordings. For several items, there is a substantial improvement in the mean opinion score after enhancing historical recordings with the proposed bandwidthextension algorithm. This study represents a relevant step toward data-driven music restoration in real-world scenarios.

show abstract

Section: A Audio Bandwidth Extension and Super-resolutionmentioning

confidence: 99%

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

Moliner

Välimäki

2023

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

show abstract

“…Systems based on dictionary learning to map low-frequency patterns to high-frequency components have been proposed in [16,17]. Classic machine learning methods have also been explored for BWE, such as Gaussian mixture models (GMMs) [18], hidden Markov models (HMM) [19,20], or non-negative matrix factorization (NMF) [21,22].…”

Section: Signal Processing Approachesmentioning

confidence: 99%

Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model

Grumiaux,

Lagrange

2023

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

The task of bandwidth extension addresses the generation of missing high frequencies of audio signals based on knowledge of the low-frequency part of the sound. This task applies to various problems, such as audio coding or audio restoration. In this article, we focus on efficient bandwidth extension of monophonic and polyphonic musical signals using a differentiable digital signal processing (DDSP) model. Such a model is composed of a neural network part with relatively few parameters trained to infer the parameters of a differentiable digital signal processing model, which efficiently generates the output full-band audio signal. We first address bandwidth extension of monophonic signals, and then propose two methods to explicitly handle polyphonic signals. The benefits of the proposed models are first demonstrated on monophonic and polyphonic synthetic data against a baseline and a deep-learning-based ResNet model. The models are next evaluated on recorded monophonic and polyphonic data, for a wide variety of instruments and musical genres. We show that all proposed models surpass a higher complexity deep learning model for an objective metric computed in the frequency domain. A MUSHRA listening test confirms the superiority of the proposed approach in terms of perceptual quality.

show abstract

“…T and is fed into the GMM-based Bayesian estimator [11] T under the MMSE criterion. The joint vector of the HF and LF vectors is referred to as…”

Section: Hf Spectral Envelope Estimator Based On Gaussian Mixture Modelmentioning

confidence: 99%

“…By using Mel-scale filters and cepstrum analysis, MFCC provides more certainty about the HF components, which is quantified as the ratio of mutual information between the HF and LF parameters to the discrete entropy of HF parameters. Then, the joint probability density function of the HF and LF feature vectors is approximated by a Gaussian mixture model (GMM), and the HF spectral envelope is estimated according to the minimum mean square error (MMSE) criterion [11,12]. The method based on MFCC and GMM can effectively reduce the spectral distortion of the extended speech compared to the method based on line spectral frequency parameters [10] and also achieves a good extension performance for audio signals.…”

Section: Introductionmentioning

confidence: 99%

Audio bandwidth extension based on temporal smoothing cepstral coefficients

Liu

Bao

2014

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

In this paper, we propose a wideband (WB) to super-wideband audio bandwidth extension (BWE) method based on temporal smoothing cepstral coefficients (TSCC). A temporal relationship of audio signals is included into feature extraction in the bandwidth extension frontend to make the temporal evolution of the extended spectra smoother. In the bandwidth extension scheme, a Gammatone auditory filter bank is used to decompose the audio signal, and the energy of each frequency band is long-term smoothed using minima controlled recursive averaging (MCRA) in order to suppress transient components. The resulting 'steady-state' spectrum is processed by frequency weighting, and the temporal smoothing cepstral coefficients are obtained by means of the power-law loudness function and cepstral normalization. The extracted temporal smoothing cepstral coefficients are fed into a Gaussian mixture model (GMM)-based Bayesian estimator to estimate the high-frequency (HF) spectral envelope, while the fine structure is restored by spectral translation. Evaluation results show that the temporal smoothing cepstral coefficients exploit the temporal relationship of audio signals and provide higher mutual information between the low-and high-frequency parameters, without increasing the dimension of input vectors in the frontend of bandwidth extension systems. In addition, the proposed bandwidth extension method is applied into the G.729.1 wideband codec and outperforms the Mel frequency cepstral coefficient (MFCC)-based method in terms of log spectral distortion (LSD), cosh measure, and differential log spectral distortion. Further, the proposed method improves the smoothness of the reconstructed spectrum over time and also gains a good performance in the subjective listening tests.

show abstract

Narrowband to wideband conversion of speech using GMM based transformation

Cited by 94 publications

References 3 publications

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

BEHM-GAN: Bandwidth Extension of Historical Music Using Generative Adversarial Networks

Efficient bandwidth extension of musical signals using a differentiable harmonic plus noise model

Audio bandwidth extension based on temporal smoothing cepstral coefficients

Contact Info

Product

Resources

About