Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org). Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases ("data marts") can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL: http://www.i2b2.org/software.
Tens of thousands of subjects may be required to obtain reliable evidence relating disease characteristics to the weak effects typically reported from common genetic variants. The costs of assembling, phenotyping, and studying these large populations are substantial, recently estimated at three billion dollars for 500,000 individuals. They are also decade-long efforts. We hypothesized that automation and analytic tools can repurpose the informational byproducts of routine clinical care, bringing sample acquisition and phenotyping to the same high-throughput pace and commodity price-point as is currently true of genome-wide genotyping. Described here is a demonstration of the capability to acquire samples and data from densely phenotyped and genotyped individuals in the tens of thousands for common diseases (e.g., in a 1-yr period: N = 15,798 for rheumatoid arthritis; N = 42,238 for asthma; N = 34,535 for major depressive disorder) in one academic health center at an order of magnitude lower cost. Even for rare diseases caused by rare, highly penetrant mutations such as Huntington disease (N = 102) and autism (N = 756), these capabilities are also of interest.A common thread in the recent flurry of studies relating characteristics of complex diseases to the generally weak effects of individual genetic variants is that very large numbers of subjects are needed to obtain reproducible results-closer to 200,000 individuals (Manolio et al. 2006) than the few thousand typical of recent publications. The costs of assembling, phenotyping, and studying these huge populations are estimated at three billion dollars for 500,000 individuals (Spivey 2006). Reciprocally, studying rare diseases often requires searching through very large populations, and sufficient sample sizes are hard to achieve. Coincidentally, the United States spends over two trillion dollars in healthcare per year (Catlin et al. 2008), and of those costs, the total investment in information technology (IT) is at least seven billion dollars per year (Girosi et al. 2005). The stimulus package recently enacted by the U.S. Congress includes a very significant increase in spending on electronic health records, prompting interest in the secondary use of the data gathered in such records. Yet there is widespread, often justified skepticism about our ability to use routinely collected electronic health records (EHRs) for research-quality phenotype data, given the well-known biases and coarse-grained nature of billing/claims diagnoses and procedures (Safran 1991;Jollis et al. 1993). By the same measure, the consistency of phenotypic definitions in large genome-wide association studies (GWAS), especially when they consist of the aggregation of several existing studies, and the consequent effect upon these study results, has been questioned (Ioannidis 2007;Wojczynski and Tiwari 2008;Buyske et al. 2009).To meet these challenges, we have undertaken a series of institutional experiments that collectively demonstrate that automated systems for mining of EHRs are essentia...
By implementing a holistic approach to patient privacy solutions, i2b2 is able to help close the gap between principle and practice.
Objective Integrating and harmonizing disparate patient data sources into one consolidated data portal enables researchers to conduct analysis efficiently and effectively. Materials and Methods We describe an implementation of Informatics for Integrating Biology and the Bedside (i2b2) to create the Mass General Brigham (MGB) Biobank Portal data repository. The repository integrates data from primary and curated data sources and is updated weekly. The data are made readily available to investigators in a data portal where they can easily construct and export customized datasets for analysis. Results As of July 2021, there are 125 645 consented patients enrolled in the MGB Biobank. 88 527 (70.5%) have a biospecimen, 55 121 (43.9%) have completed the health information survey, 43 552 (34.7%) have genomic data and 124 760 (99.3%) have EHR data. Twenty machine learning computed phenotypes are calculated on a weekly basis. There are currently 1220 active investigators who have run 58 793 patient queries and exported 10 257 analysis files. Discussion The Biobank Portal allows noninformatics researchers to conduct study feasibility by querying across many data sources and then extract data that are most useful to them for clinical studies. While institutions require substantial informatics resources to establish and maintain integrated data repositories, they yield significant research value to a wide range of investigators. Conclusion The Biobank Portal and other patient data portals that integrate complex and simple datasets enable diverse research use cases. i2b2 tools to implement these registries and make the data interoperable are open source and freely available.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.