2021
DOI: 10.3390/math9222840
|View full text |Cite
|
Sign up to set email alerts
|

Assessing Methods for Evaluating the Number of Components in Non-Negative Matrix Factorization

Abstract: Non-negative matrix factorization is a relatively new method of matrix decomposition which factors an m × n data matrix X into an m × k matrix W and a k × n matrix H, so that X ≈ W × H. Importantly, all values in X, W, and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components into which X has been decomposed. The question arises: how does one choose k? In this paper, we first assess methods for estimating k in the context of NM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 54 publications
(105 reference statements)
0
5
0
Order By: Relevance
“…When coupled with CCA, both regression models gave comparable results. Additionally, while the selection of the optimal k in either CCA or NMF plays a central role in algorithm performance 30,38 , given that the computational goal is to model drug response, we implemented feature selection on post-integration embeddings to include subspace features that correlate with drug response (see Methods). Through this, scIDUC was able to quickly select only a few meaningful features for model training and prediction without searching for optimal inner dimensions in an unsupervised fashion.…”
Section: Resultsmentioning
confidence: 99%
“…When coupled with CCA, both regression models gave comparable results. Additionally, while the selection of the optimal k in either CCA or NMF plays a central role in algorithm performance 30,38 , given that the computational goal is to model drug response, we implemented feature selection on post-integration embeddings to include subspace features that correlate with drug response (see Methods). Through this, scIDUC was able to quickly select only a few meaningful features for model training and prediction without searching for optimal inner dimensions in an unsupervised fashion.…”
Section: Resultsmentioning
confidence: 99%
“…The first relies on a modified version of BIC for NMF [35], given below: where denotes the NMF reconstructed matrix of a given modality. Alternatively, intNMF uses kneedle [36] to identify where the reduction in loss flattens with the addition of further topics.…”
Section: Methodsmentioning
confidence: 99%
“…The number of topics is then selected based on either Bayesian information criterion (BIC) or the knee point of the loss as function of topic number. The first relies on a modified version of BIC for NMF [35], given below:…”
Section: Rank Selectionmentioning
confidence: 99%
“…This models the unknown number of archetypes as a latent variable that can be directly optimized over. This method was selected due to its simplicity to implement and its superior performance compared to multiple other methods for selecting the number of factors in NMF [62]. For N-NMF, the minimum number of factors was set as k = 2 (due to the normalization constraint), and the maximum (a requirement of the approach) was chosen to be k = 30.…”
Section: Choosing the Number Of Archetypesmentioning
confidence: 99%
“…This approach models the unknown number of archetypes as a latent variable that can be directly optimized over. This method was selected due to its simplicity and superior performance compared to alternatives [30]. When training the archetype model on normal tissue transcriptomic data (GTEx), the minimum number of factors was set to k = 2 (due to the normalization constraint), with the maximum set at k = 30.…”
Section: Study Of Site-specific Adaptation Of Metastatic Breast Cancermentioning
confidence: 99%