Iterative single-cell multi-omic integration using online learning

Gao, Chao; Liu, Jialin; Kriebel, April R.; Preissl, Sebastian; Luo, Chongyuan; Castanon, Rosa; Sandoval, Justin P.; Rivkin, Angeline; Nery, Joseph R.; Behrens, M. Margarita; Ecker, Joseph R.; Ren, Bing; Welch, Joshua D.

doi:10.1038/s41587-021-00867-x

Cited by 65 publications

(73 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We assessed and benchmarked the performance of MultiMAP against several popular approaches for integrating single-cell multi-omics, including Seurat [ 10 ], LIGER [ 11 ], iNMF [ 12 ], Conos [ 29 ], and GLUER [ 30 ].…”

Section: Resultsmentioning

confidence: 99%

“…Most data integration methods project multiple measurements of information into a common low-dimensional representation to assemble multiple modalities into an integrated embedding space. Recently published methods employ different algorithms to project multiple datasets into an embedding space, including canonical correlation analysis (CCA) [ 10 ], nonnegative matrix factorization (NMF) [ 11 , 12 ], or neural network models [ 13 ]. These methods have demonstrated utility, yet suffer from shortcomings, including challenges with scaling and being limited to consideration of features shared across data sets (e.g., the same genes).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MultiMAP: dimensionality reduction and integration of multimodal data

et al. 2021

View full text Add to dashboard Cite

Multimodal data is rapidly growing in many fields of science and engineering, including single-cell biology. We introduce MultiMAP, a novel algorithm for dimensionality reduction and integration. MultiMAP can integrate any number of datasets, leverages features not present in all datasets, is not restricted to a linear mapping, allows the user to specify the influence of each dataset, and is extremely scalable to large datasets. We apply MultiMAP to single-cell transcriptomics, chromatin accessibility, methylation, and spatial data and show that it outperforms current approaches. On a new thymus dataset, we use MultiMAP to integrate cells along a temporal trajectory. This enables quantitative comparison of transcription factor expression and binding site accessibility over the course of T cell differentiation, revealing patterns of expression versus binding site opening kinetics.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

MultiMAP: dimensionality reduction and integration of multimodal data

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Therefore, developing an on-line framework of GraphFP that can cluster and annotate the single-cell time series scRNA-seq data in different batches in a serial fashion should be an interesting topic. The newly developed single-cell data analysis tools such as the on-line integration method online iNMF [ 40 ] and the cell type annotation method scArches based on transfer learning [ 41 ] can be adopted.…”

Section: Discussionmentioning

confidence: 99%

Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data

2022

View full text Add to dashboard Cite

Time series single-cell RNA sequencing (scRNA-seq) data are emerging. However, dynamic inference of an evolving cell population from time series scRNA-seq data is challenging owing to the stochasticity and nonlinearity of the underlying biological processes. This calls for the development of mathematical models and methods capable of reconstructing cellular dynamic transition processes and uncovering the nonlinear cell-cell interactions. In this study, we present GraphFP, a nonlinear Fokker-Planck equation on graph based model and dynamic inference framework, with the aim of reconstructing the cell state-transition complex potential energy landscape from time series single-cell transcriptomic data. The free energy of our model explicitly takes into account of the cell-cell interactions in a nonlinear quadratic term. We then recast the model inference problem in the form of a dynamic optimal transport framework and solve it efficiently with the adjoint method of optimal control. We evaluated GraphFP on the time series scRNA-seq data set of embryonic murine cerebral cortex development. We illustrated that it 1) reconstructs cell state potential energy, which is a measure of cellular differentiation potency, 2) faithfully charts the probability flows between paired cell states over the dynamic processes of cell differentiation, and 3) accurately quantifies the stochastic dynamics of cell type frequencies on probability simplex in continuous time. We also illustrated that GraphFP is robust in terms of cluster labelling with different resolutions, as well as parameter choices. Meanwhile, GraphFP provides a model-based approach to delineate the cell-cell interactions that drive cell differentiation. GraphFP software is available at https://github.com/QiJiang-QJ/GraphFP.

show abstract

“…Many single cell data alignment methods have been recently developed. The majority of them, with a few notable exceptions such as the recent iNMF ( 9 ), are targeted towards small-size and medium-size datasets. These existing methods can be summarized into four categories: (i) reference-based methods, such as Scmap-cluster ( 10 ) and scAlign ( 11 ), which align new inquiry datasets based on a well-annotated reference dataset; (ii) clustering-based methods, such as Harmony ( 12 ), DESC ( 13 ), which remove batch effects and align samples in an embedding space by iteratively optimizing a clustering objective function; (iii) matching-based methods, such as MNN ( 14 ) and Scanorama ( 15 ), which apply a mutually nearest neighbors strategy to identify overlapped cells across datasets and (iv) projection-based methods that use a statistical model to project individual cells from different datasets into a lower dimensional space, including Seurat ( 16 , 17 ) that applies canonical correlation analysis for projection, LIGER ( 18 ) that uses latent factors from non-negative matrix factorization for projection, and scVI ( 19 , 20 ) and others ( 21–23 ) that use variational techniques for projection.…”

Section: Introductionmentioning

confidence: 99%

“…The ZINB-based methods such as scVI may be less efficient in capturing complex expression features for multiple datasets. Although some existing recent methods ( 9 ) can be scaled up to large-size datasets, they still have potential to inaccurately align cells due to complicated parametric models. Therefore, it is in urgent need to develop effective alignment methods that are also computationally efficient.…”

Section: Introductionmentioning

confidence: 99%

Effective and scalable single-cell data alignment with non-linear canonical correlation analysis

Chen

Zhou

2021

Nucleic Acids Research

View full text Add to dashboard Cite

Data alignment is one of the first key steps in single cell analysis for integrating multiple datasets and performing joint analysis across studies. Data alignment is challenging in extremely large datasets, however, as the major of the current single cell data alignment methods are not computationally efficient. Here, we present VIPCCA, a computational framework based on non-linear canonical correlation analysis for effective and scalable single cell data alignment. VIPCCA leverages both deep learning for effective single cell data modeling and variational inference for scalable computation, thus enabling powerful data alignment across multiple samples, multiple data platforms, and multiple data types. VIPCCA is accurate for a range of alignment tasks including alignment between single cell RNAseq and ATACseq datasets and can easily accommodate millions of cells, thereby providing researchers unique opportunities to tackle challenges emerging from large-scale single-cell atlas.

show abstract

Iterative single-cell multi-omic integration using online learning

Cited by 65 publications

References 34 publications

MultiMAP: dimensionality reduction and integration of multimodal data

MultiMAP: dimensionality reduction and integration of multimodal data

Dynamic inference of cell developmental complex energy landscape from time series single-cell transcriptomic data

Effective and scalable single-cell data alignment with non-linear canonical correlation analysis

Contact Info

Product

Resources

About