Clinical Data Warehouses (DWHs) are used to provide researchers with simplified access to pseudonymized and homogenized clinical routine data from multiple primary systems. Experience with the integration of imaging and metadata from picture archiving and communication systems (PACS), however, is rare. Our goal was therefore to analyze the viability of integrating a production PACS with a research DWH to enable DWH queries combining clinical and medical imaging metadata and to enable the DWH to display and download images ad hoc. We developed an application interface that enables to query the production PACS of a large hospital from a clinical research DWH containing pseudonymized data. We evaluated the performance of bulk extracting metadata from the PACS to the DWH and the performance of retrieving images ad hoc from the PACS for display and download within the DWH. We integrated the system into the query interface of our DWH and used it successfully in four use cases. The bulk extraction of imaging metadata required a median (quartiles) time of 0.09 (0.03-2.25) to 12.52 (4.11-37.30) seconds for a median (quartiles) number of 10 (3-29) to 103 (8-693) images per patient, depending on the extraction approach. The ad hoc image retrieval from the PACS required a median (quartiles) of 2.57 (2.57-2.79) seconds per image for the download, but 5.55 (4.91-6.06) seconds to display the first and 40. 77 (38.60-41.63) seconds to display all images using the pure web-based viewer. A full integration of a production PACS with a research DWH is viable and enables various use cases in research. While the extraction of basic metadata from all images can be done with reasonable effort, the extraction of all metadata seems to be more appropriate for subgroups.
BackgroundMedication trend studies show the changes of medication over the years and may be replicated using a clinical Data Warehouse (CDW). Even nowadays, a lot of the patient information, like medication data, in the EHR is stored in the format of free text. As the conventional approach of information extraction (IE) demands a high developmental effort, we used ad hoc IE instead. This technique queries information and extracts it on the fly from texts contained in the CDW.MethodsWe present a generalizable approach of ad hoc IE for pharmacotherapy (medications and their daily dosage) presented in hospital discharge letters. We added import and query features to the CDW system, like error tolerant queries to deal with misspellings and proximity search for the extraction of the daily dosage. During the data integration process in the CDW, negated, historical and non-patient context data are filtered. For the replication studies, we used a drug list grouped by ATC (Anatomical Therapeutic Chemical Classification System) codes as input for queries to the CDW.ResultsWe achieve an F1 score of 0.983 (precision 0.997, recall 0.970) for extracting medication from discharge letters and an F1 score of 0.974 (precision 0.977, recall 0.972) for extracting the dosage. We replicated three published medical trend studies for hypertension, atrial fibrillation and chronic kidney disease. Overall, 93% of the main findings could be replicated, 68% of sub-findings, and 75% of all findings. One study could be completely replicated with all main and sub-findings.ConclusionA novel approach for ad hoc IE is presented. It is very suitable for basic medical texts like discharge letters and finding reports. Ad hoc IE is by definition more limited than conventional IE and does not claim to replace it, but it substantially exceeds the search capabilities of many CDWs and it is convenient to conduct replication studies fast and with high quality.
Background The interest in information extraction from clinical reports for secondary data use is increasing. But experience with the productive use of information extraction processes over time is scarce. A clinical data warehouse has been in use at our university hospital for several years, which also provides an information extraction of echocardiography reports developed for general use. Objectives This study aims to illustrate the difficulties encountered, while using data from a preexisting information extraction process for a large clinical study. To compare the data from the preexisting process with the data obtained from a specially developed process designed to improve the quality and completeness of the study data. Methods We extracted the echocardiography variables for 440 patients from the general-use information extraction of the data warehouse (678 reports). Then we developed an information extraction process for the same variables but specifically for this study, with the aim to extract as much information as possible from the text. The extracted data of both processes were compared with a newly created gold standard defined by a cardiologist with long-standing experience in heart failure. Results Among 57 echocardiography variables considered relevant for the study, 50 were documented in the routine text reports and could be extracted. Twenty of the required variables were not provided by the general-use extraction process, some others were not provided correctly. The median macro F1-score (precision, recall) across the 30 variables for which values were extracted was 0.81 (0.94, 0.77). Across all 50 variables, as relevant for the study, median macro F1-score was only 0.49 (0.56, 0.46). Employing the study-specific approach considerably improved the quality and completeness of the variables, resulting in F1-scores of 0.97 (0.98, 0.96) across all variables. Conclusion Data from information extractions can be used for large clinical studies. However, preexisting information extraction processes should be treated with caution, as the time and effort spent defining each variable in the information extraction process may not be clear.
Optimizing the utilization of radiology departments is one of the primary objectives for many hospitals. To support this, a solution has been developed, which at first transforms the export of different Radiological Information Systems (RIS) into the data format of a clinical data warehouse (CDW). Additional features, like for example the time between the creation of a radiologic request and the finalization of the diagnosis for the created images, can then be defined using a simple interface and are calculated and saved in the CDW as well. Finally, the query language of the CDW can be used to create custom reports with all the RIS data including the calculated features and export them into the standard formats Excel and CSV. The solution has been successfully tested with data from two German hospitals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.