Objective The aim of the study was to transform a resource of linked electronic health records (EHR) to the OMOP common data model (CDM) and evaluate the process in terms of syntactic and semantic consistency and quality when implementing disease and risk factor phenotyping algorithms. Materials and Methods Using heart failure (HF) as an exemplar, we represented three national EHR sources (Clinical Practice Research Datalink, Hospital Episode Statistics Admitted Patient Care, Office for National Statistics) into the OMOP CDM 5.2. We compared the original and CDM HF patient population by calculating and presenting descriptive statistics of demographics, related comorbidities, and relevant clinical biomarkers. Results We identified a cohort of 502 536 patients with the incident and prevalent HF and converted 1 099 195 384 rows of data from 216 581 914 encounters across three EHR sources to the OMOP CDM. The largest percentage (65%) of unmapped events was related to medication prescriptions in primary care. The average coverage of source vocabularies was >98% with the exception of laboratory tests recorded in primary care. The raw and transformed data were similar in terms of demographics and comorbidities with the largest difference observed being 3.78% in the prevalence of chronic obstructive pulmonary disease (COPD). Conclusion Our study demonstrated that the OMOP CDM can successfully be applied to convert EHR linked across multiple healthcare settings and represent phenotyping algorithms spanning multiple sources. Similar to previous research, challenges mapping primary care prescriptions and laboratory measurements still persist and require further work. The use of OMOP CDM in national UK EHR is a valuable research tool that can enable large-scale reproducible observational research.
Objective The COVID-19 pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDM) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500,000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM. Materials and methods We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHR) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data. Results We identified 502,505 participants (3,086 with COVID-19) and transformed 690 fields (1,373,239,555 rows) to the OMOP CDM using eight different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported non-cancer illnesses 946,053 (83.91% of all source entries), cancers 37,802 (70.81%), medications 1,218,935 (88.25%), and prescriptions 864,788 (86.96%). In EHR, we transformed 1,3028,182 (99.95%) hospital diagnoses, 6,465,399 (89.2%) procedures, 337,896,333 primary care diagnoses (CTV3, SNOMED-CT), 139,966,587 (98.74%) prescriptions (dm+d) and 77,127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data. Discussion and conclusion Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.
The availability of high-throughput molecular profiling techniques has provided more accurate and informative data for regular clinical studies. Nevertheless, complex computational workflows are required to interpret these data. Over the past years, the data volume has been growing explosively, requiring robust human data management to organise and integrate the data efficiently. For this reason, we set up an ELIXIR implementation study, together with the Translational research IT (TraIT) programme, to design a data ecosystem that is able to link raw and interpreted data. In this project, the data from the TraIT Cell Line Use Case (TraIT-CLUC) are used as a test case for this system. Within this ecosystem, we use the European Genome-phenome Archive (EGA) to store raw molecular profiling data; tranSMART to collect interpreted molecular profiling data and clinical data for corresponding samples; and Galaxy to store, run and manage the computational workflows. We can integrate these data by linking their repositories systematically. To showcase our design, we have structured the TraIT-CLUC data, which contain a variety of molecular profiling data types, for storage in both tranSMART and EGA. The metadata provided allows referencing between tranSMART and EGA, fulfilling the cycle of data submission and discovery; we have also designed a data flow from EGA to Galaxy, enabling reanalysis of the raw data in Galaxy. In this way, users can select patient cohorts in tranSMART, trace them back to the raw data and perform (re)analysis in Galaxy. Our conclusion is that the majority of metadata does not necessarily need to be stored (redundantly) in both databases, but that instead FAIR persistent identifiers should be available for well-defined data ontology levels: study, data access committee, physical sample, data sample and raw data file. This approach will pave the way for the stable linkage and reuse of data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.