2007
DOI: 10.1186/1471-2105-8-415
|View full text |Cite
|
Sign up to set email alerts
|

Prediction potential of candidate biomarker sets identified and validated on gene expression data from multiple datasets

Abstract: BackgroundIndependently derived expression profiles of the same biological condition often have few genes in common. In this study, we created populations of expression profiles from publicly available microarray datasets of cancer (breast, lymphoma and renal) samples linked to clinical information with an iterative machine learning algorithm. ROC curves were used to assess the prediction error of each profile for classification. We compared the prediction error of profiles correlated with molecular phenotype … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
23
0

Year Published

2008
2008
2020
2020

Publication Types

Select...
4
1
1

Relationship

0
6

Authors

Journals

citations
Cited by 24 publications
(24 citation statements)
references
References 73 publications
(73 reference statements)
1
23
0
Order By: Relevance
“…For example, a recent study on microarray data related to breast cancer, renal tumors and lymphoma and including clinical information compared the prediction errors using different training sets. The results suggested that expression profiles established in this way showed little overlap [21]. We achieved significant prediction success using different gene sets established on different microarray platforms, and therefore we provide additional support to this finding.…”
supporting
confidence: 72%
“…For example, a recent study on microarray data related to breast cancer, renal tumors and lymphoma and including clinical information compared the prediction errors using different training sets. The results suggested that expression profiles established in this way showed little overlap [21]. We achieved significant prediction success using different gene sets established on different microarray platforms, and therefore we provide additional support to this finding.…”
supporting
confidence: 72%
“…Usage of limited numbers of sensitive and specific biomarkers related to a certain disease in diagnosis (such as glucose in diabetes) is established from many years ago. Advances in biomarker discovery methods provided new opportunities to introduce efficient biomarkers or set of biomarkers (a panel) associated with the studied diseases [21]. Proteomics, genomics, metabolomics and other high throughput methods are used widely to identify new biomarkers.…”
Section: Discussionmentioning
confidence: 99%
“…Usually microarray data class prediction problems are similar to class prediction problems in other areas, so most of the classic class prediction methods have been applied to microarray data analysis -LDA projects the samples into one dimension space to maximize the distance between classes and minimize the distance in each class at the same time [46]; k-NN predicts the test sample based on the class labels of the k samples near the test sample [37,45,46,58,60]; decision tree is a classifier in the form of a tree structure and each node either indicates the label of the test sample or specifies some test to select which sub-tree to go [67]; SVM aims to find an optimal hyper-plane to separate the two classes and maximize the distance between the hyper-plane and the closest data pointing to the hyperplane [24,47,[50][51][52]56,57]; and artificial neural networks take genes as input nodes and class label as the output node to learn the parameters connecting to nodes of different layers and predict the unknown samples [68][69][70]. Some modifications to the classic methods are also applied to this problem, such as Linder et al who proposed the subsequent artificial neural network (ANN) method, which has two levels of ANN.…”
Section: Class Predictionmentioning
confidence: 99%
“…where μ ki is mean value of gene i in class k, σ ki is standard deviation of gene i in class k [1,45,46], ratio between class sum of squares to within class sum of squares [46], correlation between gene expression G i and class label Y [42,47], entropy-based method [48] are applied in feature selection process. Because most of the filter methods only consider one gene at a time, they will fail to identify the combination effect of several genes.…”
Section: Dimension Reductionmentioning
confidence: 99%