The AACR Project GENIE is an international data-sharing consortium focused on generating an evidence base for precision cancer medicine by integrating clinical-grade cancer genomic data with clinical outcome data for tens of thousands of cancer patients treated at multiple institutions worldwide. In conjunction with the first public data release from approximately 19,000 samples, we describe the goals, structure, and data standards of the consortium and report conclusions from high-level analysis of the initial phase of genomic data. We also provide examples of the clinical utility of GENIE data, such as an estimate of clinical actionability across multiple cancer types (>30%) and prediction of accrual rates to the NCI-MATCH trial that accurately reflect recently reported actual match rates. The GENIE database is expected to grow to >100,000 samples within 5 years and should serve as a powerful tool for precision cancer medicine. Significance The AACR Project GENIE aims to catalyze sharing of integrated genomic and clinical datasets across multiple institutions worldwide, and thereby enable precision cancer medicine research, including the identification of novel therapeutic targets, design of biomarker-driven clinical trials, and identification of genomic determinants of response to therapy.
Results of medical research studies are often contradictory or cannot be reproduced. One reason is that there may not be enough patient subjects available for observation for a long enough time period. Another reason is that patient populations may vary considerably with respect to geographic and demographic boundaries thus limiting how broadly the results apply. Even when similar patient populations are pooled together from multiple locations, differences in medical treatment and record systems can limit which outcome measures can be commonly analyzed. In total, these differences in medical research settings can lead to differing conclusions or can even prevent some studies from starting. We thus sought to create a patient research system that could aggregate as many patient observations as possible from a large number of hospitals in a uniform way. We call this system the ‘Shared Health Research Information Network’, with the following properties: (1) reuse electronic health data from everyday clinical care for research purposes, (2) respect patient privacy and hospital autonomy, (3) aggregate patient populations across many hospitals to achieve statistically significant sample sizes that can be validated independently of a single research setting, (4) harmonize the observation facts recorded at each institution such that queries can be made across many hospitals in parallel, (5) scale to regional and national collaborations. The purpose of this report is to provide open source software for multi-site clinical studies and to report on early uses of this application. At this time SHRINE implementations have been used for multi-site studies of autism co-morbidity, juvenile idiopathic arthritis, peripartum cardiomyopathy, colorectal cancer, diabetes, and others. The wide range of study objectives and growing adoption suggest that SHRINE may be applicable beyond the research uses and participating hospitals named in this report.
We live in the genomic era of medicine, where a patient's genomic/molecular data is becoming increasingly important for disease diagnosis, identification of targeted therapy, and risk assessment for adverse reactions. However, decoding the genomic test results and integrating it with clinical data for retrospective studies and cohort identification for prospective clinical trials is still a challenging task. In order to overcome these barriers, we developed an overarching enterprise informatics framework for translational research and personalized medicine called Synergistic Patient and Research Knowledge Systems (SPARKS) and a suite of tools called Oncology Data Retrieval Systems (OncDRS). OncDRS enables seamless data integration, secure and self-navigated query and extraction of clinical and genomic data from heterogeneous sources. Within a year of release, the system has facilitated more than 1500 research queries and has delivered data for more than 50 research studies.
Patients with non-small cell lung cancer (NSCLC) who have distant metastases have a poor prognosis. To determine which genomic factors of the primary tumor are associated with metastasis, we analyzed data from 759 patients originally diagnosed with stage I–III NSCLC as part of the AACR Project GENIE Biopharma Collaborative consortium. We found that TP53 mutations were significantly associated with the development of new distant metastases. TP53 mutations were also more prevalent in patients with a history of smoking, suggesting that these patients may be at increased risk for distant metastasis. Our results suggest that additional investigation of the optimal management of patients with early-stage NSCLC harboring TP53 mutations at diagnosis is warranted in light of their higher likelihood of developing new distant metastases.
PURPOSE Siloed electronic medical data limits utility and accessibility. At the Dana-Farber/Boston Children's Cancer and Blood Disorders Center, cross-institutional data were inconsistent and difficult to access. To unify data for clinical operations, administration, and research, we developed the Pediatric Patient Informatics Platform ( PPIP), an integrated datamart harmonizing multiple source systems across two institutions into a common technology. PATIENTS AND METHODS Starting in 2009, user requirements were gathered and data sources were prioritized. Project teams, including biostatisticians, database developers, and an external contractor, were formed. Read-access to source systems was established. The 3-layer PPIP architecture was developed: STAGING, a near-exact copy of source data; INTEGRATION, where data were reorganized into domains; and, CONSUMPTION, where data were optimized for rapid retrieval. The diverse systems were integrated into a common IBM Netezza technology. Data filters were defined to accurately capture the Center's patients, and derived data items were created for harmonization across sources. An interactive online query tool, PPIP360, was developed using Microstrategy Analytics. RESULTS Driven by scientific objectives, the PPIP datamart was created, including 33,674 patients, 2,983 protocols, and 3.6 million patient visits from 14 source databases, 164 source tables, and 2,622 source data items. The PPIP360 has 605 data items and 33 metrics across 11 reports and dashboards. Dana-Farber and Boston Children's established a legal data-sharing agreement. The PPIP has supported hundreds of faculty, staff, and projects, including planning clinical trials and informing strategic planning. CONCLUSION The PPIP has successfully harmonized and integrated diagnostic, demographic, laboratory, treatment, clinical outcome, pathology, transplant, meta-protocol, and –omics data, for efficient, daily operational and research activities at Dana-Farber/Boston Children's Cancer and Blood Disorders Center, and future external sharing.
<p>Supplemental Methods. Supplemental Table 1: ââ,¬â€¹Genomic Data Characterization by Center. Supplemental Table 2: ââ,¬â€¹Gene Panels Submitted by Each Center. Figure S1: Number of putative germline SNPs per sample, before and after uniform germline filtering. Figure S2ââ,¬â€¹. Distribution of total somatic mutation burden per sample stratified by sequencing panel. Figure S3: ââ,¬â€¹Log-scale comparison of mutation frequencies at hotspot sites between GENIE (data aggregated from all sequencing panels) and cancerhotspots.org (CHS) using a binomial test. Figure S4:ââ,¬â€¹ Comparison of mutation frequencies at hotspot sites in each GENIE sequencing panel with cancerhotspots.org (CHS) using a binomial test.</p>
<p>AACR GENIE Data Guide</p>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.