A Novel NMF-HMM Speech Enhancement Algorithm Based on Poisson Mixture Model

Xiang, Yang; Shi, Liming; Højvang, Jesper Lisby; Rasmussen, Morten Højfeldt; Christensen, Mads Græsbøll

doi:10.1109/icassp39728.2021.9414620

Cited by 8 publications

(5 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Unlike SC, mixture models (such as PMMs) assume a single latent per data point for data generation, i.e., data is modeled in terms of a single compositional feature rather than a combination of multiple features. PMMs assume the observable data to be subject to Poisson distributed noise, and they have been applied in the context of image and audio signal processing applications before (e.g., [81][82][83]). The here used PMM-based denoising approach is described in more detail in S1 Text.…”

Section: Algorithmsmentioning

confidence: 99%

Zero-shot denoising of microscopy images recorded at high-resolution limits

Salwig,

Drefs,

Lücke

2024

PLoS Comput Biol

View full text Add to dashboard Cite

Conventional and electron microscopy visualize structures in the micrometer to nanometer range, and such visualizations contribute decisively to our understanding of biological processes. Due to different factors in recording processes, microscopy images are subject to noise. Especially at their respective resolution limits, a high degree of noise can negatively effect both image interpretation by experts and further automated processing. However, the deteriorating effects of strong noise can be alleviated to a large extend by image enhancement algorithms. Because of the inherent high noise, a requirement for such algorithms is their applicability directly to noisy images or, in the extreme case, to just a single noisy image without a priori noise level information (referred to as blind zero-shot setting). This work investigates blind zero-shot algorithms for microscopy image denoising. The strategies for denoising applied by the investigated approaches include: filtering methods, recent feed-forward neural networks which were amended to be trainable on noisy images, and recent probabilistic generative models. As datasets we consider transmission electron microscopy images including images of SARS-CoV-2 viruses and fluorescence microscopy images. A natural goal of denoising algorithms is to simultaneously reduce noise while preserving the original image features, e.g., the sharpness of structures. However, in practice, a tradeoff between both aspects often has to be found. Our performance evaluations, therefore, focus not only on noise removal but set noise removal in relation to a metric which is instructive about sharpness. For all considered approaches, we numerically investigate their performance, report their denoising/sharpness tradeoff on different images, and discuss future developments. We observe that, depending on the data, the different algorithms can provide significant advantages or disadvantages in terms of their noise removal vs. sharpness preservation capabilities, which may be very relevant for different virological applications, e.g., virological analysis or image segmentation.

show abstract

Section: Algorithmsmentioning

confidence: 99%

Zero-shot denoising of microscopy images recorded at high-resolution limits

Salwig,

Drefs,

Lücke

2024

PLoS Comput Biol

View full text Add to dashboard Cite

show abstract

“…In this paper, we propose a novel NMF-HMM speech enhancement method based on the Kullback-Leibler (KL) divergence, expanding on our preliminary work [47]. Our preliminary work has briefly verified the effectiveness of an NMF-HMM for speech enhancement [47,48], but the effect of the parameters for the model was not considered. This is very important to optimize the algorithm performance.…”

Section: Open Accessmentioning

confidence: 99%

A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

Xiang

Shi

Højvang³

et al. 2022

J AUDIO SPEECH MUSIC PROC.

Self Cite

View full text Add to dashboard Cite

In this paper, we propose a supervised single-channel speech enhancement method that combines Kullback-Leibler (KL) divergence-based non-negative matrix factorization (NMF) and a hidden Markov model (NMF-HMM). With the integration of the HMM, the temporal dynamics information of speech signals can be taken into account. This method includes a training stage and an enhancement stage. In the training stage, the sum of the Poisson distribution, leading to the KL divergence measure, is used as the observation model for each state of the HMM. This ensures that a computationally efficient multiplicative update can be used for the parameter update of this model. In the online enhancement stage, a novel minimum mean square error estimator is proposed for the NMF-HMM. This estimator can be implemented using parallel computing, reducing the time complexity. Moreover, compared to the traditional NMF-based speech enhancement methods, the experimental results show that our proposed algorithm improved the short-time objective intelligibility and perceptual evaluation of speech quality by 5% and 0.18, respectively.

show abstract

“…To summarize, the proposed β-PVAE includes a training and an enhancement stage for the SE application, which is similar to PVAE [26]. In the training stage, C-VAE and N-VAE are separately pre-trained by self-supervision using (4) and (5). After that, we apply (8) to train NS-VAE.…”

Section: β-Vae-based Speech Enhancementmentioning

confidence: 99%

“…During the past decades, many single-channel SE algorithms have been developed, including signal subspace methods [4], non-negative matrix factorization methods [5], [6], and codebook-based methods [7]. In recent years, deep neural networks (DNN) have shown great potential for SE [2], [8]- [14] because DNNs can use a non-linear process to model complex high-dimensional signals, which is more reasonable in practical applications [15].…”

Section: Introductionmentioning

confidence: 99%

A deep representation learning speech enhancement method using $β$-VAE

Xiang¹,

Højvang²,

Rasmussen³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

In previous work, we proposed a variational autoencoder-based (VAE) Bayesian permutation training speech enhancement (SE) method (PVAE) which indicated that the SE performance of the traditional deep neural network-based (DNN) method could be improved by deep representation learning (DRL). Based on our previous work, we in this paper propose to use β-VAE to further improve PVAE's ability of representation learning. More specifically, our β-VAE can improve PVAE's capacity of disentangling different latent variables from the observed signal without the trade-off problem between disentanglement and signal reconstruction. This trade-off problem widely exists in previous β-VAE algorithms. Unlike the previous β-VAE algorithms, the proposed β-VAE strategy can also be used to optimize the DNN's structure. This means that the proposed method can not only improve PVAE's SE performance but also reduce the number of PVAE training parameters. The experimental results show that the proposed method can acquire better speech and noise latent representation than PVAE. Meanwhile, it also obtains a higher scale-invariant signal-todistortion ratio, speech quality, and speech intelligibility.

show abstract

A Novel NMF-HMM Speech Enhancement Algorithm Based on Poisson Mixture Model

Cited by 8 publications

References 29 publications

Zero-shot denoising of microscopy images recorded at high-resolution limits

Zero-shot denoising of microscopy images recorded at high-resolution limits

A speech enhancement algorithm based on a non-negative hidden Markov model and Kullback-Leibler divergence

A deep representation learning speech enhancement method using $β$-VAE

Contact Info

Product

Resources

About