2021
DOI: 10.1093/nar/gkab1147
|View full text |Cite
|
Sign up to set email alerts
|

Effective and scalable single-cell data alignment with non-linear canonical correlation analysis

Abstract: Data alignment is one of the first key steps in single cell analysis for integrating multiple datasets and performing joint analysis across studies. Data alignment is challenging in extremely large datasets, however, as the major of the current single cell data alignment methods are not computationally efficient. Here, we present VIPCCA, a computational framework based on non-linear canonical correlation analysis for effective and scalable single cell data alignment. VIPCCA leverages both deep learning for eff… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
15
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 16 publications
(15 citation statements)
references
References 51 publications
0
15
0
Order By: Relevance
“…The rapid accumulation of large-scale single-cell datasets requires integration algorithms to efficiently handle datasets containing millions of cells without loss of accuracy. For a comprehensive comparison, we first benchmarked Portal and existing representative methods, including Harmony [21], Seurat v3 [22], online iNMF [23], VIPCCA [24], scVI [25], fastMNN [26], Scanorama [27] and BBKNN [28], in terms of integration performance following a recent benchmarking study [30]. Using a number of scRNA-seq datasets from diverse tissue types with curated cell cluster annotations, including mouse spleen, marrow, and bladder [7], we quantitatively evaluated the integration performance of each method.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…The rapid accumulation of large-scale single-cell datasets requires integration algorithms to efficiently handle datasets containing millions of cells without loss of accuracy. For a comprehensive comparison, we first benchmarked Portal and existing representative methods, including Harmony [21], Seurat v3 [22], online iNMF [23], VIPCCA [24], scVI [25], fastMNN [26], Scanorama [27] and BBKNN [28], in terms of integration performance following a recent benchmarking study [30]. Using a number of scRNA-seq datasets from diverse tissue types with curated cell cluster annotations, including mouse spleen, marrow, and bladder [7], we quantitatively evaluated the integration performance of each method.…”
Section: Resultsmentioning
confidence: 99%
“…scANVI [71] is another VAE-based method with similar pros and cons of scVI, as it is an extension of scVI that incorporates cell type information into its model. Recently, VIPCCA [24] was proposed to leverage VAE-based networks to perform nonlinear canonical correlation analysis (CCA) efficiently. However, we empirically found that it favors the removal of batch effects over the preservation of biological information (Figs.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Second, the fast-evolving technology of single-cell omics provides opportunities to integrate omics profiles from different modalities for the same individuals. Extending DR-SC by integrating multiple different omics techniques, such as through the canonical correlation analysis framework [80] which is a nature extension of PCA towards multiple modality analysis, will also likely achieve higher statistical performance. Third, DR-SC essentially performs unsupervised clustering, but with the availability of labels for some cells/spots, it would be interesting to perform semi-supervised clustering of those data.…”
Section: Discussionmentioning
confidence: 99%
“…Sequence compositions of nucleic acids and proteins are significantly associated with genome evolution and adaptation across all kingdoms of life [ 17 ]. Machine/deep learning methods have worked well in predicting viral hosts based on the amino acid [ 18 ] or dinucleotide (DNT) [ 19 ] composition in the sequence alignment of large datasets [ 20 ]. Moreover, language representation methods have learned the language of viral evolution and escape based on represented amino acids [ 21 ] or statistically represented proteins [ 22 ].…”
Section: Introductionmentioning
confidence: 99%