2019
DOI: 10.1038/s41598-019-46059-1
|View full text |Cite
|
Sign up to set email alerts
|

Uncovering biomarker genes with enriched classification potential from Hallmark gene sets

Abstract: Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call “candidate genes”, by evaluating the ability of gene combinations to classify samples from a dataset, which we call “classification potential”. Our algorithm, Gene Oracle,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
9
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8
1

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(14 citation statements)
references
References 26 publications
(26 reference statements)
2
9
0
Order By: Relevance
“…Our work and that of others also support the possibility of extending prediction across species; for example, a recent study indicates that the mouse transcriptome reveals potential signatures of protection and pathogenesis in human tuberculosis ( 38 ). The approach also found significant information content in more general sets of signatures, such as those listed in the hallmark gene sets of the Molecular Signatures Database, a result consistent with recent observations ( 39 ) and the use of co-expression patterns to define transcriptome modules ( 40 ). We interpret this as indication that even in the absence of pre-existing information in the literature, well-defined sets of hallmark genes will enable the extraction and creation of transfer signatures.…”
Section: Discussionsupporting
confidence: 84%
“…Our work and that of others also support the possibility of extending prediction across species; for example, a recent study indicates that the mouse transcriptome reveals potential signatures of protection and pathogenesis in human tuberculosis ( 38 ). The approach also found significant information content in more general sets of signatures, such as those listed in the hallmark gene sets of the Molecular Signatures Database, a result consistent with recent observations ( 39 ) and the use of co-expression patterns to define transcriptome modules ( 40 ). We interpret this as indication that even in the absence of pre-existing information in the literature, well-defined sets of hallmark genes will enable the extraction and creation of transfer signatures.…”
Section: Discussionsupporting
confidence: 84%
“…Despite the attractiveness of single-gene biomarkers, there is growing evidence (mainly from cancer research) to suggest that their performance is limited ( Targonski et al, 2019 ). Of course, genes do not work in isolation, and there is overwhelming evidence for reproducible transcriptional covariation in blood and other tissues, suggesting a modular organization to genomic function.…”
Section: Resultsmentioning
confidence: 99%
“…We used a two phase, bottom-up classification approach of a feedforward neural network, known as Gene Oracle 15 ( https://github.com/SystemsGenetics/gene-oracle ), to classify brain regions, and thus, identify the region-specific gene biomarkers. Gene Oracle uses a multilayer perceptron (MLP) feedforward neural network 35 to identify biomarker gene sets with a significant classification accuracy when comparing to sets with equal number of random genes.…”
Section: Methodsmentioning
confidence: 99%
“…The effectiveness of the biomarkers to discriminate conditions can be formally tested using machine learning and other classification techniques. For example, Gene Oracle is a software package that implements a deep learning model to classify biological samples using gene expression features as input 15 . In the Gene Oracle algorithm, expression profiles of candidate gene sets are tested for significant non-random classification potential of sample types (i.e.…”
mentioning
confidence: 99%