2017
DOI: 10.1371/journal.pone.0182507
|View full text |Cite
|
Sign up to set email alerts
|

A comprehensive simulation study on classification of RNA-Seq data

Abstract: RNA sequencing (RNA-Seq) is a powerful technique for the gene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies. Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of geneexpression data are either based on a continuous scale (eg. mi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 35 publications
(21 citation statements)
references
References 49 publications
(62 reference statements)
0
14
0
Order By: Relevance
“…We tested this hypothesis by applying machine learning algorithms on two groups of heifers that were bred in 2015 (year one) and 2016 (year two). Parallel random forest emerged as the algorithm with over 90% efficiency of classification nearly all trials executed, which confirms the potential of accurate classification of samples using RNA-seq data under the case-control framework 69,70 . The results show that while not one single gene emerges as a potential biomarker, the accumulated information of transcript abundance from multiple genes can be powerful for the identification of fertility potential in cattle.…”
Section: Discussionmentioning
confidence: 53%
“…We tested this hypothesis by applying machine learning algorithms on two groups of heifers that were bred in 2015 (year one) and 2016 (year two). Parallel random forest emerged as the algorithm with over 90% efficiency of classification nearly all trials executed, which confirms the potential of accurate classification of samples using RNA-seq data under the case-control framework 69,70 . The results show that while not one single gene emerges as a potential biomarker, the accumulated information of transcript abundance from multiple genes can be powerful for the identification of fertility potential in cattle.…”
Section: Discussionmentioning
confidence: 53%
“…Pigs from all groups were pooled ( n = 15) and Spearman's correlation coefficients were calculated to determine associations between the expression of differentially expressed genes with concentrations of cardiometabolic risk indicators (LDL cholesterol, HDL cholesterol, and hsCRP) and atherosclerotic lesion severity. A random forest algorithm, produced by the R package MLSeq, was used to identify genes differentially expressed by the WD, with the top variable importance score based on ability to classify the pooled group of pigs by the presence of atherosclerosis in the proximal LAD coronary artery ( 26 ). Raw gene count data underwent a deseq normalization and voom transformation, and a random forest model was run with a k -fold cross-validation (CV = 10) and repeated 1000 times.…”
Section: Methodsmentioning
confidence: 99%
“…The confusion matrixes were generated using MLSeq (Zararsiz et al 2017). Details can be found in the Supplemental Methods, and the detailed script can be found in Supplemental Code.…”
Section: Confusion Matrix Generationmentioning
confidence: 99%