ObjectiveTo review the methods and dimensions of data quality assessment in the context of electronic health record (EHR) data reuse for research.Materials and methodsA review of the clinical research literature discussing data quality assessment methodology for EHR data was performed. Using an iterative process, the aspects of data quality being measured were abstracted and categorized, as well as the methods of assessment used.ResultsFive dimensions of data quality were identified, which are completeness, correctness, concordance, plausibility, and currency, and seven broad categories of data quality assessment methods: comparison with gold standards, data element agreement, data source agreement, distribution comparison, validity checks, log review, and element presence.DiscussionExamination of the methods by which clinical researchers have investigated the quality and suitability of EHR data for research shows that there are fundamental features of data quality, which may be difficult to measure, as well as proxy dimensions. Researchers interested in the reuse of EHR data for clinical research are recommended to consider the adoption of a consistent taxonomy of EHR data quality, to remain aware of the task-dependence of data quality, to integrate work on data quality assessment from other fields, and to adopt systematic, empirically driven, statistically based methods of data quality assessment.ConclusionThere is currently little consistency or potential generalizability in the methods used to assess EHR data quality. If the reuse of EHR data for clinical research is to become accepted, researchers should adopt validated, systematic methods of EHR data quality assessment.
BACKGROUND Exome sequencing is emerging as a first-line diagnostic method in some clinical disciplines, but its usefulness has yet to be examined for most constitutional disorders in adults, including chronic kidney disease, which affects more than 1 in 10 persons globally. METHODS We conducted exome sequencing and diagnostic analysis in two cohorts totaling 3315 patients with chronic kidney disease. We assessed the diagnostic yield and, among the patients for whom detailed clinical data were available, the clinical implications of diagnostic and other medically relevant findings. RESULTS In all, 3037 patients (91.6%) were over 21 years of age, and 1179 (35.6%) were of self-identified non-European ancestry. We detected diagnostic variants in 307 of the 3315 patients (9.3%), encompassing 66 different monogenic disorders. Of the disorders detected, 39 (59%) were found in only a single patient. Diagnostic variants were detected across all clinically defined categories, including congenital or cystic renal disease (127 of 531 patients [23.9%]) and nephropathy of unknown origin (48 of 281 patients [17.1%]). Of the 2187 patients assessed, 34 (1.6%) had genetic findings for medically actionable disorders that, although unrelated to their nephropathy, would also lead to subspecialty referral and inform renal management. CONCLUSIONS Exome sequencing in a combined cohort of more than 3000 patients with chronic kidney disease yielded a genetic diagnosis in just under 10% of cases. (Funded by the National Institutes of Health and others.)
Objective:Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is ‘fit’ for specific uses.Materials and Methods:DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical framework’s inclusiveness was evaluated against ten published DQ terminologies.Results:Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the data may be verified with organizational data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies.Discussion:Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ assessment and reporting. While our analysis focused on the DQ issues often found in EHR data, the new terminology may be applicable to a wide range of electronic health data such as administrative, research, and patient-reported data.Conclusion:A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling data owners and users, patients, and policy makers to evaluate and communicate data quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable data quality assessment and reporting methods.
We demonstrate the importance of explicit definitions of electronic health record (EHR) data completeness and how different conceptualizations of completeness may impact findings from EHR-derived datasets. This study has important repercussions for researchers and clinicians engaged in the secondary use of EHR data. We describe four prototypical definitions of EHR completeness: documentation, breadth, density, and predictive completeness. Each definition dictates a different approach to the measurement of completeness. These measures were applied to representative data from NewYork-Presbyterian Hospital’s clinical data warehouse. We found that according to any definition, the number of complete records in our clinical database is far lower than the nominal total. The proportion that meets criteria for completeness is heavily dependent on the definition of completeness used, and the different definitions generate different subsets of records. We conclude that the concept of completeness in EHR is contextual. We urge data consumers to be explicit in how they define a complete record and transparent about the limitations of their data.
Introduction:We describe the formulation, development, and initial expert review of 3x3 Data Quality Assessment (DQA), a dynamic, evidence-based guideline to enable electronic health record (EHR) data quality assessment and reporting for clinical research.Methods:3x3 DQA was developed through the triangulation results from three studies: a review of the literature on EHR data quality assessment, a quantitative study of EHR data completeness, and a set of interviews with clinical researchers. Following initial development, the guideline was reviewed by a panel of EHR data quality experts.Results:The guideline embraces the task-dependent nature of data quality and data quality assessment. The core framework includes three constructs of data quality: complete, correct, and current data. These constructs are operationalized according to the three primary dimensions of EHR data: patients, variables, and time. Each of the nine operationalized constructs maps to a methodological recommendation for EHR data quality assessment. The initial expert response to the framework was positive, but improvements are required.Discussion:The initial version of 3x3 DQA promises to enable explicit guideline-based best practices for EHR data quality assessment and reporting. Future work will focus on increasing clarity on how and when 3x3 DQA should be used during the research process, improving the feasibility and ease-of-use of recommendation execution, and clarifying the process for users to determine which operationalized constructs and recommendations are relevant for a given dataset and study.
Standards-based, computable knowledge representations for eligibility criteria are increasingly needed to provide computer-based decision support for automated research participant screening, clinical evidence application, and clinical research knowledge management. We surveyed the literature and identified five aspects of eligibility criteria knowledge representations that contribute to the various research and clinical applications: the intended use of computable eligibility criteria, the classification of eligibility criteria, the expression language for representing eligibility rules, the encoding of eligibility concepts, and the modeling of patient data. We consider three of them (expression language, codification of eligibility concepts, and patient data modeling), to be essential constructs of a formal knowledge representation for eligibility criteria. The requirements for each of the three knowledge constructs vary for different use cases, which therefore should inform the development and choice of the constructs toward cost-effective knowledge representation efforts. We discuss the implications of our findings for standardization efforts toward sharable knowledge representation of eligibility criteria.
This study present a semi-automated data-driven approach to developing a semantic network that aligns well with the top-level information structure in clinical research eligibility criteria text and demonstrates the feasibility of using the resulting semantic role labels to generate semistructured eligibility criteria with nearly perfect interrater reliability.
This study evaluated the performance of an electronic screening (E-screening) method and used it to recruit patients for the NIH sponsored ACCORD trial. Out of the 193 E-screened patients, 125 met the age criterion ("age>or=40"). For all of these 125 patients, the performance of E-screening was compared with investigator review. E-screening achieved a negative predictive accuracy of 100% (95% CI: 98-100%), a positive predictive accuracy of 13% (95% CI: 6-13%), a sensitivity of 100% (95% CI: 45-100%), and a specificity of 84% (95% CI: 82-84%). The method maximized the use of a patient database query (i.e., excluded ineligible patients with a 100% accuracy and automatically assembled patient information to facilitate manual review of only patients who were classified as "potentially eligible" by E-screening) and significantly reduced the screening burden associated with the ACCORD trial.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.