Background The advancement of information technology has immensely increased the quality and volume of health data. This has led to an increase in observational study, as well as to the threat of privacy invasion. Recently, a distributed research network based on the common data model (CDM) has emerged, enabling collaborative international medical research without sharing patient-level data. Although the CDM database for each institution is built inside a firewall, the risk of re-identification requires management. Hence, this study aims to elucidate the perceptions CDM users have towards CDM and risk management for re-identification. Methods The survey, targeted to answer specific in-depth questions on CDM, was conducted from October to November 2020. We targeted well-experienced researchers who actively use CDM. Basic statistics (total number and percent) were computed for all covariates. Results There were 33 valid respondents. Of these, 43.8% suggested additional anonymization was unnecessary beyond, “minimum cell count” policy, which obscures a cell with a value lower than certain number (usually 5) in shared results to minimize the liability of re-identification due to rare conditions. During extract-transform-load processes, 81.8% of respondents assumed structured data is under control from the risk of re-identification. However, respondents noted that date of birth and death were highly re-identifiable information. The majority of respondents (n = 22, 66.7%) conceded the possibility of identifier-contained unstructured data in the NOTE table. Conclusion Overall, CDM users generally attributed high reliability for privacy protection to the intrinsic nature of CDM. There was little demand for additional de-identification methods. However, unstructured data in the CDM were suspected to have risks. The necessity for a coordinating consortium to define and manage the re-identification risk of CDM was urged.
Background Death is a crucial measure in electronic medical record (EMR) studies, where it has significance as a criterion for analyzing mortality in the database. This study aimed to assess extracted death data quality and investigate the potential of a final administered medication variable as an indicator to quantify the accuracy of a newly extracted control group’s death data. Methods Data were collected through Asan Biomedical Research Environment, which comprised data from both the Asan Medical Center and The Korean Central Cancer Registry. The gold standard was established by examining differences according to death information sources through a chart review. Cosine similarity was employed to quantify the final administered medication similarities between the gold standard and other cohorts using the Anatomical Therapeutic Chemical classification system code. Results The gold standard was determined as patients who died in hospital after 2006, when the final hospital visit/discharge date and death date differed by 0 or 1. For all three criteria, a) Seer Stage, b) cancer type c) type of final visit; as the mortality rate increased, the final administered medication cosine similarity with the golden standard increased. Conclusion This study introduced an indicator that can provide additional accurate death information and differentiate reliability. In the future, variables other than EMR could be used to further determine death information quality, in addition to the final administered medication.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.