Objective The COVID-19 pandemic has demonstrated the value of real-world data for public health research. International federated analyses are crucial for informing policy makers. Common data models (CDM) are critical for enabling these studies to be performed efficiently. Our objective was to convert the UK Biobank, a study of 500,000 participants with rich genetic and phenotypic data to the Observational Medical Outcomes Partnership (OMOP) CDM. Materials and methods We converted UK Biobank data to OMOP CDM v. 5.3. We transformedparticipant research data on diseases collected at recruitment and electronic health records (EHR) from primary care, hospitalizations, cancer registrations, and mortality from providers in England, Scotland, and Wales. We performed syntactic and semantic validations and compared comorbidities and risk factors between source and transformed data. Results We identified 502,505 participants (3,086 with COVID-19) and transformed 690 fields (1,373,239,555 rows) to the OMOP CDM using eight different controlled clinical terminologies and bespoke mappings. Specifically, we transformed self-reported non-cancer illnesses 946,053 (83.91% of all source entries), cancers 37,802 (70.81%), medications 1,218,935 (88.25%), and prescriptions 864,788 (86.96%). In EHR, we transformed 1,3028,182 (99.95%) hospital diagnoses, 6,465,399 (89.2%) procedures, 337,896,333 primary care diagnoses (CTV3, SNOMED-CT), 139,966,587 (98.74%) prescriptions (dm+d) and 77,127 (99.95%) deaths (ICD-10). We observed good concordance across demographic, risk factor, and comorbidity factors between source and transformed data. Discussion and conclusion Our study demonstrated that the OMOP CDM can be successfully leveraged to harmonize complex large-scale biobanked studies combining rich multimodal phenotypic data. Our study uncovered several challenges when transforming data from questionnaires to the OMOP CDM which require further research. The transformed UK Biobank resource is a valuable tool that can enable federated research, like COVID-19 studies.
The relevance of health data research on real world data (RWD) is increasing. To prepare national RWD for international research, harmonization with standard terminologies is required. In this paper, we evaluate to what extent the German OPS vocabulary in OHDSI covers codes present in RWD and mappings to SNOMED-CT. The evaluation identified a mapping gap of 21.1% in the RWD set.
BACKGROUND National classifications and terminologies already routinely used for documentation within patient care settings enable the unambiguous representation of clinical information. However, the diversity of different vocabularies across healthcare institutions and countries is a barrier to achieve semantic interoperability and to exchange data across sites. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) enables the standardization of structure and medical terminology. It allows mapping of national vocabularies into so-called standard-concepts representing normative expressions for international analyses and research. Within our project “Hybrid quality indicators using machine learning methods (Hybrid-QI)”, we faced the challenge of harmonizing source codes used in German claims data vocabularies currently not available in OMOP CDM. OBJECTIVE The objective of this study is to increase the coverage of German vocabularies in OMOP CDM. With our work, we want to achieve the complete transformation of source codes used in German claims data to OMOP CDM without data loss as well as making German claims data usable for research based on OMOP CDM. METHODS To prepare the missing German vocabularies for OMOP CDM we defined a vocabulary preparation approach consisting of the identification of all codes of the corresponding vocabularies, their assembly in machine-readable tables and the translation of German designations into English. Furthermore, we used two proposed approaches for OMOP-compliant vocabulary preparation: the mapping to standard-concepts using the OHDSI tool Usagi and the preparation of new 2-billion-concepts. Finally, we evaluated the prepared vocabularies regarding completeness and correctness using synthetic German claims data and calculated the coverage of German claims data vocabularies in OMOP CDM. RESULTS With our vocabulary preparation approach, we were able to map three missing German vocabularies to standard-concepts and to prepare eight vocabularies as new 2-billion-concepts. The results of the completeness evaluation showed that the prepared vocabularies cover most of the source codes contained in German claims data. From the results of the correctness evaluation, it can be seen that the specified validity periods in OMOP CDM are compliant for the majority of source codes and associated dates in the German claims data. The calculation of the vocabulary coverage showed a noticeable decrease of missing vocabularies from 55% to 10% due to our preparation approach. CONCLUSIONS By preparing a total of ten vocabularies, we were able to show that our approach is applicable to any type of vocabulary used in a source dataset. The prepared vocabularies resulting from our work are currently limited to German vocabularies, which can only be used in national OMOP CDM research projects. Reason for this is a missing mapping of new 2-billion-concepts to standard-concepts. To be able to participate in international OHDSI network studies with German claims data, future work is required to map the prepared 2-billion-concepts to standard-concepts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.