Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer

Margolin, Adam A.; Bilal, Erhan; Huang, Erich; Norman, Thea; Ottestad, Lars; Mecham, Brigham H.; Sauerwine, Ben; Kellen, Michael R.; Mangravite, Lara M.; Furia, Matthew D.; Vollan, Hans Kristian Moen; Rueda, Oscar M.; Guinney, Justin; Deflaux, Nicole; Hoff, Bruce; Schildwachter, Xavier; Russnes, Hege G.; Park, Daehoon; Vang, Veronica Okkenhaug; Pirtle, Tyler; Youseff, Lamia; Citro, Craig; Curtis, Christina; Kristensen, Vessela N.; Hellerstein, Joseph M.; Friend, Stephen H.; Stolovitzky, Gustavo; Aparicio, Samuel; Caldas, Carlos; Børresen-Dale, Anne Lise

doi:10.1126/scitranslmed.3006112

Cited by 117 publications

(131 citation statements)

References 43 publications

Supporting

Mentioning

123

Contrasting

Order By: Relevance

“…Gene expression profiles were generated using the Illumina_Human_WG-v3 array platform 24 and normalized by quantile normalization with linear modeling batch correction, as described elsewhere. 55 Copy number levels were generated on the Affymetrix SNP Array 6.0 and normalized using the supervised normalization of microarrays (SNM) framework 56 as described by 55 and also using DNAcopy 54,57 to define low- and high-level copy number thresholds. A 173-gene exome sequencing panel was used to identify somatic gene mutations and generate measures of mutational burden (gene count).…”

Section: Methodsmentioning

confidence: 99%

Tumor mutational burden is a determinant of immune-mediated survival in breast cancer

et al. 2018

View full text Add to dashboard Cite

Mounting evidence supports a role for the immune system in breast cancer outcomes. The ability to distinguish highly immunogenic tumors susceptible to anti-tumor immunity from weakly immunogenic or inherently immune-resistant tumors would guide development of therapeutic strategies in breast cancer. Genomic, transcriptomic and clinical data from The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) breast cancer cohorts were used to examine statistical associations between tumor mutational burden (TMB) and the survival of patients whose tumors were assigned to previously-described prognostic immune subclasses reflecting favorable, weak or poor immune-infiltrate dispositions (FID, WID or PID, respectively). Tumor immune subclasses were associated with survival in patients with high TMB (TMB-Hi, P < 0.001) but not in those with low TMB (TMB-Lo, P = 0.44). This statistical relationship was confirmed in the METABRIC cohort (TMB-Hi, P = 0.047; TMB-Lo, P = 0.39), and also found to hold true in the more-indolent Luminal A tumor subtype (TMB-Hi, P = 0.011; TMB-Lo, P = 0.91). In TMB-Hi tumors, the FID subclass was associated with prolonged survival independent of tumor stage, molecular subtype, age and treatment. Copy number analysis revealed the reproducible, preferential amplification of chromosome 1q immune-regulatory genes in the PID immune subclass. These findings demonstrate a previously unappreciated role for TMB as a determinant of immune-mediated survival of breast cancer patients and identify candidate immune-regulatory mechanisms associated with immunologically cold tumors. Immune subtyping of breast cancers may offer opportunities for therapeutic stratification.

show abstract

Section: Methodsmentioning

confidence: 99%

Tumor mutational burden is a determinant of immune-mediated survival in breast cancer

et al. 2018

View full text Add to dashboard Cite

show abstract

“…If there was more than one platform provided for each patient, the measurements were combined and renormalized using RMA. The METABRIC dataset was renormalized by Sage Synapse (4). Because the BCAM formula is the linear combination of heterogeneous covariates, we corrected the distribution of genomic assays in each dataset by multiplying the size and the lymph node number with the ratio of the standard deviations of the genomic assays in each dataset to the standard deviation of the genomic assays in the METABRIC dataset.…”

Section: Datasets Preprocessing End Points Of Survival Analysismentioning

confidence: 99%

“…A recent crowd-sourced research study, the Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge (BCC; ref. 4) used the METABRIC dataset (5) containing molecular and clinical features from 1,981 patients with breast cancer. The winning model (6,7) as well as all five topscoring models made use of several molecular features, called attractor metagenes (8), as well as the FGD3-SUSD3 metagene defined by the average of the expression levels of the two genes, FGD3 and SUSD3, which are located directly adjacent to each other at Chr9q22.31.…”

Section: Introductionmentioning

confidence: 99%

Breast Cancer Prognostic Biomarker Using Attractor Metagenes and the FGD3–SUSD3 Metagene

Yang

Cheng

Tian

et al. 2014

Cancer Epidemiology, Biomarkers &Amp; Prevention

View full text Add to dashboard Cite

Background: The winning model of the Sage Bionetworks/DREAM Breast Cancer Prognosis Challenge made use of several molecular features, called attractor metagenes, as well as another metagene defined by the average expression level of the two genes FGD3 and SUSD3. This is a follow-up study toward developing a breast cancer prognostic test derived from and improving upon that model.Methods: We designed a feature selector facility calculating the prognostic scores of combinations of features, including those that we had used earlier, as well as those used in existing breast cancer biomarker assays, identifying the optimal selection of features for the test.Results: The resulting test, called BCAM (Breast Cancer Attractor Metagenes), is universally applicable to all clinical subtypes and stages of breast cancer and does not make any use of breast cancer molecular subtype or hormonal status information, none of which provided additional prognostic value. BCAM is composed of several molecular features: the breast cancer-specific FGD3-SUSD3 metagene, four attractor metagenes present in multiple cancer types (CIN, MES, LYM, and END), three additional individual genes (CD68, DNAJB9, and CXCL12), tumor size, and the number of positive lymph nodes.Conclusions: Our analysis leads to the unexpected and remarkable suggestion that ER, PR, and HER2 status, or molecular subtype classification, do not provide additional prognostic value when the values of the FGD3-SUSD3 and attractor metagenes are taken into consideration.Impact: Our results suggest that BCAM's prognostic predictions show potential to outperform those resulting from existing breast cancer biomarker assays. Cancer Epidemiol Biomarkers Prev; 23(12); 2850-6. Ó2014 AACR.

show abstract

“…A key challenge is to improve decision making by combining these multiple predictions of unknown reliability. Automating this process of combining multiple predictors is an active field of research in decision science (cci.mit.edu/research), medicine (10), business (refs. 11 and 12 and www.kaggle.com/competitions), and government (www.iarpa.gov/Programs/ia/ACE/ace.html and www.goodjudgmentproject.com), as well as in statistics and machine learning.…”

mentioning

confidence: 99%

Ranking and combining multiple predictors without labeled data

Parisi

Strino

Nadler

et al. 2014

Proc. Natl. Acad. Sci. U.S.A.

106

View full text Add to dashboard Cite

In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier's accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the offdiagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.spectral analysis | classifier balanced accuracy | unsupervised learning | cartels | crowdsourcing E very day, multiple decisions are made based on input and suggestions from several sources, either algorithms or advisers, of unknown reliability. Investment companies handle their portfolios by combining reports from several analysts, each providing recommendations on buying, selling, or holding multiple stocks (1, 2). Central banks combine surveys of several professional forecasters to monitor rates of inflation, real gross domestic product growth, and unemployment (3-6). Biologists study the genomic binding locations of proteins by combining or ranking the predictions of several peak detection algorithms applied to large-scale genomics data (7). Physician tumor boards convene a number of experts from different disciplines to discuss patients whose diseases pose diagnostic and therapeutic challenges (8). Peer-review panels discuss multiple grant applications and make recommendations to fund or reject them (9). The examples above describe scenarios in which several human advisers or algorithms provide their predictions or answers to a list of queries or questions. A key challenge is to improve decision making by combining these multiple predictions of unknown reliability. Automating this process of combining multiple predictors is an active field of research in decision science (cci.mit.edu/research), medicine (10), business (refs. 11 and 12 and www.kaggle.com/competitions), and government...

show abstract

Systematic Analysis of Challenge-Driven Improvements in Molecular Prognostic Models for Breast Cancer

Cited by 117 publications

References 43 publications

Tumor mutational burden is a determinant of immune-mediated survival in breast cancer

Tumor mutational burden is a determinant of immune-mediated survival in breast cancer

Breast Cancer Prognostic Biomarker Using Attractor Metagenes and the FGD3–SUSD3 Metagene

Ranking and combining multiple predictors without labeled data

Contact Info

Product

Resources

About