An overview of techniques for linking high‐dimensional molecular data to time‐to‐event endpoints by risk prediction models

Binder, Harald; Porzelius, Christine; Schumacher, Martin

doi:10.1002/bimj.201000152

Cited by 19 publications

(14 citation statements)

References 109 publications

(103 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This paper illustrates that binary classifiers highly depended on how the risk groups were defined. Binder et al [30] investigated the effects of the choice of threshold on the predictions and showed that there is little overlap of selected genes between an early and median threshold cutoffs, which might be due to short-term and long-term effects of genes or the censoring pattern.…”

Section: Discussionmentioning

confidence: 99%

“…A slight change of the threshold can lead to very different prediction accuracy and interpretation. Binder et al [30] applied three different survival thresholds to evaluate a binary classifier based on gene expression, and showed how the choice of threshold affected the predictions. They concluded that using the binary modeling approach can result in loss of efficiency and potential bias in high dimensional settings.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Assessment of performance of survival prediction models for cancer prognosis

Chen

Kodell

Cheng

et al. 2012

BMC Med Res Methodol

View full text Add to dashboard Cite

BackgroundCancer survival studies are commonly analyzed using survival-time prediction models for cancer prognosis. A number of different performance metrics are used to ascertain the concordance between the predicted risk score of each patient and the actual survival time, but these metrics can sometimes conflict. Alternatively, patients are sometimes divided into two classes according to a survival-time threshold, and binary classifiers are applied to predict each patient’s class. Although this approach has several drawbacks, it does provide natural performance metrics such as positive and negative predictive values to enable unambiguous assessments.MethodsWe compare the survival-time prediction and survival-time threshold approaches to analyzing cancer survival studies. We review and compare common performance metrics for the two approaches. We present new randomization tests and cross-validation methods to enable unambiguous statistical inferences for several performance metrics used with the survival-time prediction approach. We consider five survival prediction models consisting of one clinical model, two gene expression models, and two models from combinations of clinical and gene expression models.ResultsA public breast cancer dataset was used to compare several performance metrics using five prediction models. 1) For some prediction models, the hazard ratio from fitting a Cox proportional hazards model was significant, but the two-group comparison was insignificant, and vice versa. 2) The randomization test and cross-validation were generally consistent with the p-values obtained from the standard performance metrics. 3) Binary classifiers highly depended on how the risk groups were defined; a slight change of the survival threshold for assignment of classes led to very different prediction results.Conclusions1) Different performance metrics for evaluation of a survival prediction model may give different conclusions in its discriminatory ability. 2) Evaluation using a high-risk versus low-risk group comparison depends on the selected risk-score threshold; a plot of p-values from all possible thresholds can show the sensitivity of the threshold selection. 3) A randomization test of the significance of Somers’ rank correlation can be used for further evaluation of performance of a prediction model. 4) The cross-validated power of survival prediction models decreases as the training and test sets become less balanced.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Assessment of performance of survival prediction models for cancer prognosis

Chen

Kodell

Cheng

et al. 2012

BMC Med Res Methodol

View full text Add to dashboard Cite

show abstract

“…For a more general overview of such approaches, see e.g. Binder et al [7] and for a comparison of the most common methods see Bøvelstad et al [8] or van Wieringen et al [9]. We will specifically consider the lasso [10] and componentwise likelihood-based boosting [11], [12] as representative approaches for regularized regression with variable selection.…”

Section: Introductionmentioning

confidence: 99%

Transforming RNA-Seq Data to Improve the Performance of Prognostic Gene Signatures

Zwiener

Frisch²,

Binder³

2014

PLoS ONE

Self Cite

147

131

View full text Add to dashboard Cite

Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.

show abstract

“…Especially when probability estimation is derived via machine learning methods the noninformation error provides valuable information on the potential amount of overfitting and resulting overoptimism that can be inherited when these techniques are not properly tuned. So we find ourselves in a similar situation as with, for example, regularized regression models such as the Lasso (Tibshirani, ) or boosting (Binder et al., ). This makes a unifying view of all these approaches as flexible statistical models useful.…”

mentioning

confidence: 85%