2020
DOI: 10.1101/2020.11.22.393165
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Identification of new marker genes from plant single-cell RNA-seq data using interpretable machine learning methods

Abstract: An essential step of single-cell RNA sequencing analysis is to classify specific cell types with marker genes in order to dissect the biological functions of each individual cell. In this study, we integrated five published scRNA-seq datasets from the Arabidopsis root containing over 25,000 cells and 17 cell clusters. We have compared the performance of seven machine learning methods in classifying these cell types, and determined that the random forest and support vector machine methods performed best. Using … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 59 publications
0
2
0
Order By: Relevance
“…Nevertheless, other aspects of gene regulation such as transcription factors and their binding motifs are conserved across plants. The potential for classifying cell types in less-well characterized plants on the basis of existing single-cell data from Arabidopsis is well illustrated by a recent study that tested several of the 111 identified marker genes for trichoblast cells in five other plant species ( Yan et al, 2020 ). As the efficacy of machine learning relies critically on data volume, the results will only improve as more and more datasets become available.…”
Section: Cell-type Annotation Without the Deep Prior Knowledge Gather...mentioning
confidence: 99%
“…Nevertheless, other aspects of gene regulation such as transcription factors and their binding motifs are conserved across plants. The potential for classifying cell types in less-well characterized plants on the basis of existing single-cell data from Arabidopsis is well illustrated by a recent study that tested several of the 111 identified marker genes for trichoblast cells in five other plant species ( Yan et al, 2020 ). As the efficacy of machine learning relies critically on data volume, the results will only improve as more and more datasets become available.…”
Section: Cell-type Annotation Without the Deep Prior Knowledge Gather...mentioning
confidence: 99%
“…Therefore, RFE is not an ideal choice for feature selection in scRNA-seq data. Random forest and its application on scRNA-seq data have been examined thoroughly in other studies [ 8 , 9 , 10 , 11 ]. Moreover, some penalized regression methods are worth exploring to compare their performances for feature selection in scRNA-seq data because these methods were developed primarily for tackling the challenges of “large-p-small-n” problems [ 12 ].…”
Section: Introductionmentioning
confidence: 99%