Relation between PLSA and NMF and implications

Gaussier, Éric; Goutte, Cyril

doi:10.1145/1076034.1076148

Cited by 200 publications

(144 citation statements)

References 6 publications

Supporting

Mentioning

131

Contrasting

Unclassified

Order By: Relevance

“…We choose LDA for extracting latent topic labels among radiology report documents because LDA is shown to be more flexible yet learns more coherent topics over large sets of documents [43]. Furthermore, pLSA can be regarded as a special case of LDA [13] and NMF as a semi-equivalent model of pLSA [12,10]. LDA offers a hierarchy of extracted topics and the number of topics can be chosen by evaluating each model's perplexity score (Equation 1), which is a common way to measure how well a probabilistic model generalizes by evaluating the log-likelihood of the model on a held-out test set.…”

Section: Document Topic Learning With Latent Dirichlet Allocationmentioning

confidence: 99%

Interleaved text/image Deep Mining on a large-scale radiology database

Shin

Lü

Kim

et al. 2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Despite tremendous progress in computer vision, effective learning on very large-scale (> 100K patients) medical image databases has been vastly hindered. We present an interleaved text/image deep learning system to extract and mine the semantic interactions of radiology images and reports from a national research hospital's picture archiving and communication system. Instead of using full 3D medical volumes, we focus on a collection of representative~216K 2D key images/slices (selected by clinicians for diagnostic reference) with text-driven scalar and vector labels. Our system interleaves between unsupervised learning (e.g., latent Dirichlet allocation, recurrent neural net language models) on document-and sentence-level texts to generate semantic labels and supervised learning via deep convolutional neural networks (CNNs) to map from images to label spaces. Disease-related key words can be predicted for radiology images in a retrieval manner. We have demonstrated promising quantitative and qualitative results. The large-scale datasets of extracted key images and their categorization, embedded vector labels and sentence descriptions can be harnessed to alleviate the deep learning "datahungry" obstacle in the medical domain.

show abstract

Section: Document Topic Learning With Latent Dirichlet Allocationmentioning

confidence: 99%

Interleaved text/image Deep Mining on a large-scale radiology database

Shin

Lü

Kim

et al. 2015

2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

show abstract

“…Given this connection, one can also establish a relation between LDA and certain settings of low-rank matrix factorization. Specifically, Gaussier and Goutte [8] and Ding et al [4] have noted that pLSA correspond to specific instances of the problem of non-negative matrix factorization. pLSA can thus be reduced to a low-rank matrix factorization problem.…”

Section: Islda ≈ Random Matrix Factorizationmentioning

confidence: 99%

Larger Residuals, Less Work: Active Document Scheduling for Latent Dirichlet Allocation

Wahabzada

Kersting

2011

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. Recently, there have been considerable advances in fast inference for latent Dirichlet allocation (LDA). In particular, stochastic optimization of the variational Bayes (VB) objective function with a natural gradient step was proved to converge and able to process massive document collections. To reduce noise in the gradient estimation, it considers multiple documents chosen uniformly at random. While it is widely recognized that the scheduling of documents in stochastic optimization may have significant consequences, this issue remains largely unexplored. In this work, we address this issue. Specifically, we propose residual LDA, a novel, easy-to-implement, LDA approach that schedules documents in an informed way. Intuitively, in each iteration, residual LDA actively selects documents that exert a disproportionately large influence on the current residual to compute the next update. On several real-world datasets, including 3M articles from Wikipedia, we demonstrate that residual LDA can handily analyze massive document collections and find topic models as good or better than those found with batch VB and randomly scheduled VB, and significantly faster.

show abstract

“…( [2] refers to its algorithm as probabilistic latent semantic analysis (PLSA). Under proper normalization and for the KL objective function used in this paper, NMF and PLSA are numerically equivalent [3], so the results in [2] are equally relevant to NMF or PLSA.) NMF works well for separating sounds when the building blocks for different sources are sufficiently distinct.…”

Section: Introductionmentioning

confidence: 99%

Speech denoising using nonnegative matrix factorization with priors

Wilson¹,

Raj²,

Smaragdis

et al. 2008

2008 IEEE International Conference on Acoustics, Speech and Signal Processing

231

137

View full text Add to dashboard Cite

We present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. ABSTRACTWe present a technique for denoising speech using nonnegative matrix factorization (NMF) in combination with statistical speech and noise models. We compare our new technique to standard NMF and to a state-of-the-art Wiener filter implementation and show improvements in speech quality across a range of interfering noise types.

show abstract

Relation between PLSA and NMF and implications

Cited by 200 publications

References 6 publications

Interleaved text/image Deep Mining on a large-scale radiology database

Interleaved text/image Deep Mining on a large-scale radiology database

Larger Residuals, Less Work: Active Document Scheduling for Latent Dirichlet Allocation

Speech denoising using nonnegative matrix factorization with priors

Contact Info

Product

Resources

About