2021
DOI: 10.1038/s41587-021-00867-x
|View full text |Cite
|
Sign up to set email alerts
|

Iterative single-cell multi-omic integration using online learning

Abstract: Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative nonnegative matrix factorization (iNMF), an algorithm for integrating large, diverse, and continually arriving single-cell datasets. Our approach scales to arbitrarily large numbers of cells using fixed memory, iteratively incorporates new datasets as they are generated, and allows many users to simultaneously analyze a s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
69
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 65 publications
(73 citation statements)
references
References 34 publications
0
69
0
Order By: Relevance
“…We assessed and benchmarked the performance of MultiMAP against several popular approaches for integrating single-cell multi-omics, including Seurat [ 10 ], LIGER [ 11 ], iNMF [ 12 ], Conos [ 29 ], and GLUER [ 30 ].…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We assessed and benchmarked the performance of MultiMAP against several popular approaches for integrating single-cell multi-omics, including Seurat [ 10 ], LIGER [ 11 ], iNMF [ 12 ], Conos [ 29 ], and GLUER [ 30 ].…”
Section: Resultsmentioning
confidence: 99%
“…Most data integration methods project multiple measurements of information into a common low-dimensional representation to assemble multiple modalities into an integrated embedding space. Recently published methods employ different algorithms to project multiple datasets into an embedding space, including canonical correlation analysis (CCA) [ 10 ], nonnegative matrix factorization (NMF) [ 11 , 12 ], or neural network models [ 13 ]. These methods have demonstrated utility, yet suffer from shortcomings, including challenges with scaling and being limited to consideration of features shared across data sets (e.g., the same genes).…”
Section: Introductionmentioning
confidence: 99%
“…Therefore, developing an on-line framework of GraphFP that can cluster and annotate the single-cell time series scRNA-seq data in different batches in a serial fashion should be an interesting topic. The newly developed single-cell data analysis tools such as the on-line integration method online iNMF [ 40 ] and the cell type annotation method scArches based on transfer learning [ 41 ] can be adopted.…”
Section: Discussionmentioning
confidence: 99%
“…Many single cell data alignment methods have been recently developed. The majority of them, with a few notable exceptions such as the recent iNMF ( 9 ), are targeted towards small-size and medium-size datasets. These existing methods can be summarized into four categories: (i) reference-based methods, such as Scmap-cluster ( 10 ) and scAlign ( 11 ), which align new inquiry datasets based on a well-annotated reference dataset; (ii) clustering-based methods, such as Harmony ( 12 ), DESC ( 13 ), which remove batch effects and align samples in an embedding space by iteratively optimizing a clustering objective function; (iii) matching-based methods, such as MNN ( 14 ) and Scanorama ( 15 ), which apply a mutually nearest neighbors strategy to identify overlapped cells across datasets and (iv) projection-based methods that use a statistical model to project individual cells from different datasets into a lower dimensional space, including Seurat ( 16 , 17 ) that applies canonical correlation analysis for projection, LIGER ( 18 ) that uses latent factors from non-negative matrix factorization for projection, and scVI ( 19 , 20 ) and others ( 21–23 ) that use variational techniques for projection.…”
Section: Introductionmentioning
confidence: 99%
“…The ZINB-based methods such as scVI may be less efficient in capturing complex expression features for multiple datasets. Although some existing recent methods ( 9 ) can be scaled up to large-size datasets, they still have potential to inaccurately align cells due to complicated parametric models. Therefore, it is in urgent need to develop effective alignment methods that are also computationally efficient.…”
Section: Introductionmentioning
confidence: 99%