2019
DOI: 10.1038/s41598-019-56911-z
|View full text |Cite
|
Sign up to set email alerts
|

Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data

Abstract: In many research areas scientists are interested in clustering objects within small datasets while making use of prior knowledge from large reference datasets. We propose a method to apply the machine learning concept of transfer learning to unsupervised clustering problems and show its effectiveness in the field of single-cell RNA sequencing (scRNA-Seq). The goal of scRNA-Seq experiments is often the definition and cataloguing of cell types from the transcriptional output of individual cells. To improve the c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
22
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 28 publications
(22 citation statements)
references
References 76 publications
0
22
0
Order By: Relevance
“…One interesting idea to use complex models on small datasets is to leverage larger, already annotated, datasets to learn the embedding, using techniques from the field of transfer learning or domain adaptation. Embeddings learned by PCA and non-negative matrix factorisation (NMF) on datasets such as the Human Cell Atlas (HCA) have successfully been used in both scATAC-seq [21] and scRNA-seq [22, 23] on new unseen datasets and cell types, as well as used for denoising the new dataset [24]. Similarly the embeddings learned by (demoising) AEs on one dataset, have been shown to be useful on other datasets, both for clustering [25, 26 • , 27, 28] and surface protein prediction [29].…”
Section: Introductionmentioning
confidence: 99%
“…One interesting idea to use complex models on small datasets is to leverage larger, already annotated, datasets to learn the embedding, using techniques from the field of transfer learning or domain adaptation. Embeddings learned by PCA and non-negative matrix factorisation (NMF) on datasets such as the Human Cell Atlas (HCA) have successfully been used in both scATAC-seq [21] and scRNA-seq [22, 23] on new unseen datasets and cell types, as well as used for denoising the new dataset [24]. Similarly the embeddings learned by (demoising) AEs on one dataset, have been shown to be useful on other datasets, both for clustering [25, 26 • , 27, 28] and surface protein prediction [29].…”
Section: Introductionmentioning
confidence: 99%
“…Transfer learning-reusing the information learned from a model developed for one task as the starting point for a model on a second different, but related, task-has been shown to reduce the amount of data required for training while improving overall model performance for diverse applications (reviewed in [13]). In biology, transfer learning has been successful in several areas, including: reconstructing gene regulatory networks [14][15][16]; modeling gene expression from single-cell data [17][18][19][20]; or predicting genomic features, including accessible regions [21], chromatin interactions [22], and TFBSs [23,24].…”
Section: Introductionmentioning
confidence: 99%
“…In this application, we use gene signatures from CoGAPS for projection and transfer learning. Other transfer learning methods have been developed to relate features in a target scRNA-seq dataset to a reference atlas, often relying on non-linear methods for feature identification [77,78]. In contrast to these other approaches, our projectR software is robust for transfer learning from single-cell data (e.g., PCA, clustering, and other forms of linear matrix factorization) and may capture additional features of cell state transitions based upon all of these methodologies [7,30].…”
Section: Discussionmentioning
confidence: 99%