2019
DOI: 10.1186/s12864-019-6412-8
|View full text |Cite
|
Sign up to set email alerts
|

Putative biomarkers for predicting tumor sample purity based on gene expression data

Abstract: BackgroundTumor purity is the percent of cancer cells present in a sample of tumor tissue. The non-cancerous cells (immune cells, fibroblasts, etc.) have an important role in tumor biology. The ability to determine tumor purity is important to understand the roles of cancerous and non-cancerous cells in a tumor.MethodsWe applied a supervised machine learning method, XGBoost, to data from 33 TCGA tumor types to predict tumor purity using RNA-seq gene expression data.ResultsAcross the 33 tumor types, the median … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
21
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(23 citation statements)
references
References 40 publications
(58 reference statements)
0
21
0
Order By: Relevance
“…6, November-December 2020: 306-312 researchers are increasingly using XGBoost in biomarker discovery. 12,33,34 To improve the prediction performance, we tuned hyperparameters using random search, which is more efficient than either a traditional manual or grid search and evaluates more of the search space, especially when the search space has more than three dimensions. 35 As we did not have an external data set to evaluate our model, we performed a sevenfold cross-validation with accuracy as the overall metric and sensitivity and specificity as the class-specific metrics.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…6, November-December 2020: 306-312 researchers are increasingly using XGBoost in biomarker discovery. 12,33,34 To improve the prediction performance, we tuned hyperparameters using random search, which is more efficient than either a traditional manual or grid search and evaluates more of the search space, especially when the search space has more than three dimensions. 35 As we did not have an external data set to evaluate our model, we performed a sevenfold cross-validation with accuracy as the overall metric and sensitivity and specificity as the class-specific metrics.…”
Section: Discussionmentioning
confidence: 99%
“…Moreover, XGBoost is an optimized distributed gradient boosting that achieves state-of-the-art prediction performances. 12 Feature importance ranking using the common tree ensemble models such as XGBoost and gbm R packages may provide inconsistent results. These methods only consider the effect of splits along the decision path.…”
Section: Introductionmentioning
confidence: 99%
“…Recently, tumor purity is inferred from different types of genomic data, such as somatic copy number data [18][19][20][21][22], somatic mutations data [23][24][25][26][27], gene expression data [28,29], and DNA methylation data [30][31][32][33]. The tumor purity obtained from these methods will be referred to as genomic tumor purity in this study.…”
Section: Introductionmentioning
confidence: 99%
“…This study chose coiled-coil domain-containing protein 69 (CCDC69) as the target gene through bioinformatics analysis. Researchers used CCDC69 expression to predict tumor sample purity, 9 which is vital to immune infiltration. 10 Cui et al found that CCDC69 may reduce cisplatin resistance in ovarian cancer by activating P14ARF/MDM2/P53 signaling pathway.…”
Section: Introductionmentioning
confidence: 99%