PROTEMPA supports the identification and categorization of patients with complex disease based on the characteristics of and relationships between time sequences in multiple data types. Identifying patient populations who share these types of patterns may be useful when patient features of interest do not have standard codes, are poorly-expressed in coding schemes, may be inaccurately or incompletely coded, or are not represented explicitly as data values.
SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses.
Objective
To create an analytics platform for specifying and detecting clinical phenotypes and other derived variables in electronic health record (EHR) data for quality improvement investigations.
Materials and Methods
We have developed an architecture for an Analytic Information Warehouse (AIW). It supports transforming data represented in different physical schemas into a common data model, specifying derived variables in terms of the common model to enable their reuse, computing derived variables while enforcing invariants and ensuring correctness and consistency of data transformations, long-term curation of derived data, and export of derived data into standard analysis tools. It includes software that implements these features and a computing environment that enables secure high-performance access to and processing of large datasets extracted from EHRs.
Results
We have implemented and deployed the architecture in production locally. The software is available as open source. We have used it as part of hospital operations in a project to reduce rates of hospital readmission within 30 days. The project examined the association of over 100 derived variables representing disease and co-morbidity phenotypes with readmissions in five years of data from our institution’s clinical data warehouse and the UHC Clinical Database (CDB). The CDB contains administrative data from over 200 hospitals that are in academic medical centers or affiliated with such centers.
Discussion and Conclusion
A widely available platform for managing and detecting phenotypes in EHR data could accelerate the use of such data in quality improvement and comparative effectiveness studies.
To manage and integrate information gathered from heterogeneous databases, an ontology is often used. Like all systems, ontology-driven systems evolve over time and must be regression tested to gain confidence in the behavior of the modified system. Because rerunning all existing tests can be extremely expensive, researchers have developed regressiontest-selection (RTS) techniques that select a subset of the available tests that are affected by the changes, and use this subset to test the modified system. Existing RTS techniques have been shown to be effective, but they operate on the code and are unable to handle changes that involve ontologies. To address this limitation, we developed and present in this paper a novel RTS technique that targets ontologydriven systems. Our technique creates representations of the old and new ontologies, compares them to identify entities affected by the changes, and uses this information to select the subset of tests to rerun. We also describe in this paper OntoReTest, a tool that implements our technique and that we used to empirically evaluate our approach on two biomedical ontology-driven database systems. The results of our evaluation show that our technique is both efficient and effective in selecting tests to rerun and in reducing the overall time required to perform regression testing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.