Although prognostic gene expression signatures for survival in early stage lung cancer have been proposed, for clinical application it is critical to establish their performance across different subject populations and in different laboratories. Here we report a large, training-testing, multi-site blinded validation study to characterize the performance of several prognostic models based on gene expression for 442 lung adenocarcinomas. The hypotheses proposed examined whether
To resolve the genetic heterogeneity within pediatric high-risk B-precursor acute lymphoblastic leukemia (ALL), a clinically defined poor-risk group with few known recurring cytogenetic abnormalities, we performed gene expression profiling in a cohort of 207 uniformly treated children with high-risk ALL. Expression profiles were correlated with genome-wide DNA copy number abnormalities and clinical and outcome features. Unsupervised clustering of gene expression profiling data revealed 8 unique cluster groups within these highrisk ALL patients, 2 of which were associated with known chromosomal translocations (t(1;19)(TCF3-PBX1) or MLL), and 6 of which lacked any previously known cytogenetic lesion. One unique cluster was characterized by high expression of distinct outlier genes AGAP1, CCNJ, CHST2/7, CLEC12A/B, and PTPRM; ERG DNA deletions; and 4-year relapse-free survival of 94.7% ؎ 5.1%, compared with 63.5% ؎ 3.7% for the cohort (P ؍ .01). A second cluster, characterized by high expression of BMPR1B, CRLF2, GPR110, and MUC4; frequent deletion of EBF1, IKZF1, RAG1-2, and IL3RA-CSF2RA; JAK mutations and CRLF2 rearrangements (P < .0001); and Hispanic ethnicity (P < .001) had a very poor 4-year relapsefree survival (21.0% ؎ 9.5%; P < .001). These studies reveal striking clinical and genetic heterogeneity in high-risk ALL and point to novel genes that may serve as new targets for diagnosis, risk classification, and therapy. (Blood. 2010; 116(23):4874-4884)
The National Cancer Institute (NCI) Investigational Drug Steering Committee (IDSC) charged the Biomarker Task Force to develop recommendations to improve the decisions about incorporation of biomarker studies in early investigational drug trials. The Task Force members reviewed biomarker trials, the peer-reviewed literature, NCI and U.S. Food and Drug Administration (FDA) guidance documents, and conducted a survey of investigators to determine practices and challenges to executing biomarker studies in clinical trials of new drugs in early development. This document provides standard definitions and categories of biomarkers, and lists recommendations to sponsors and investigators for biomarker incorporation into such trials. Our recommendations for sponsors focus on the identification and prioritization of biomarkers and assays, the coordination of activities for the development and use of assays, and for operational activities. We also provide recommendations for investigators developing clinical trials with biomarker studies for scientific rationale, assay criteria, trial design, and analysis. The incorporation of biomarker studies into early drug trials is complex. Thus the decision to proceed with studies of biomarkers should be based on balancing the strength of science, assay robustness, feasibility, and resources with the burden of proper sample collection on the patient and potential impact of the results on drug development. The Task Force provides these guidelines in the hopes that improvements in biomarker studies will enhance the efficiency of investigational drug development. Clin Cancer Res; 16(6); 1745-55. ©2010 AACR.
BackgroundWe consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?ResultsWe develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts.ConclusionsBy applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller n resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (n ≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.
Immunotherapies have emerged as one of the most promising approaches to treat patients with cancer. Recently, there have been many clinical successes using checkpoint receptor blockade, including T cell inhibitory receptors such as cytotoxic T-lymphocyte-associated antigen 4 (CTLA-4) and programmed cell death-1 (PD-1). Despite demonstrated successes in a variety of malignancies, responses only typically occur in a minority of patients in any given histology. Additionally, treatment is associated with inflammatory toxicity and high cost. Therefore, determining which patients would derive clinical benefit from immunotherapy is a compelling clinical question.Although numerous candidate biomarkers have been described, there are currently three FDA-approved assays based on PD-1 ligand expression (PD-L1) that have been clinically validated to identify patients who are more likely to benefit from a single-agent anti-PD-1/PD-L1 therapy. Because of the complexity of the immune response and tumor biology, it is unlikely that a single biomarker will be sufficient to predict clinical outcomes in response to immune-targeted therapy. Rather, the integration of multiple tumor and immune response parameters, such as protein expression, genomics, and transcriptomics, may be necessary for accurate prediction of clinical benefit. Before a candidate biomarker and/or new technology can be used in a clinical setting, several steps are necessary to demonstrate its clinical validity. Although regulatory guidelines provide general roadmaps for the validation process, their applicability to biomarkers in the cancer immunotherapy field is somewhat limited. Thus, Working Group 1 (WG1) of the Society for Immunotherapy of Cancer (SITC) Immune Biomarkers Task Force convened to address this need. In this two volume series, we discuss pre-analytical and analytical (Volume I) as well as clinical and regulatory (Volume II) aspects of the validation process as applied to predictive biomarkers for cancer immunotherapy. To illustrate the requirements for validation, we discuss examples of biomarker assays that have shown preliminary evidence of an association with clinical benefit from immunotherapeutic interventions. The scope includes only those assays and technologies that have established a certain level of validation for clinical use (fit-for-purpose). Recommendations to meet challenges and strategies to guide the choice of analytical and clinical validation design for specific assays are also provided.Electronic supplementary materialThe online version of this article (doi:10.1186/s40425-016-0178-1) contains supplementary material, which is available to authorized users.
Many gene expression studies attempt to develop a predictor of pre-defined diagnostic or prognostic classes. If the classes are similar biologically, then the number of genes that are differentially expressed between the classes is likely to be small compared to the total number of genes measured. This motivates a two-step process for predictor development, a subset of differentially expressed genes is selected for use in the predictor and then the predictor constructed from these. Both these steps will introduce variability into the resulting classifier, so both must be incorporated in sample size estimation. We introduce a methodology for sample size determination for prediction in the context of high-dimensional data that captures variability in both steps of predictor development. The methodology is based on a parametric probability model, but permits sample size computations to be carried out in a practical manner without extensive requirements for preliminary data. We find that many prediction problems do not require a large training set of arrays for classifier development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.