This study provides an experimental performance evaluation on population-based queries of NoSQL databases storing archetype-based Electronic Health Record (EHR) data. There are few published studies regarding the performance of persistence mechanisms for systems that use multilevel modelling approaches, especially when the focus is on population-based queries. A healthcare dataset with 4.2 million records stored in a relational database (MySQL) was used to generate XML and JSON documents based on the openEHR reference model. Six datasets with different sizes were created from these documents and imported into three single machine XML databases (BaseX, eXistdb and Berkeley DB XML) and into a distributed NoSQL database system based on the MapReduce approach, Couchbase, deployed in different cluster configurations of 1, 2, 4, 8 and 12 machines. Population-based queries were submitted to those databases and to the original relational database. Database size and query response times are presented. The XML databases were considerably slower and required much more space than Couchbase. Overall, Couchbase had better response times than MySQL, especially for larger datasets. However, Couchbase requires indexing for each differently formulated query and the indexing time increases with the size of the datasets. The performances of the clusters with 2, 4, 8 and 12 nodes were not better than the single node cluster in relation to the query response time, but the indexing time was reduced proportionally to the number of nodes. The tested XML databases had acceptable performance for openEHR-based data in some querying use cases and small datasets, but were generally much slower than Couchbase. Couchbase also outperformed the response times of the relational database, but required more disk space and had a much longer indexing time. Systems like Couchbase are thus interesting research targets for scalable storage and querying of archetype-based EHR data when population-based use cases are of interest.
BackgroundWith the increasing presence of biomedical data sources on the Internet more and more research effort is put into finding possible ways for integrating and searching such often heterogeneous sources. Ontologies are a key technology in this effort. However, developing ontologies is not an easy task and often the resulting ontologies are not complete. In addition to being problematic for the correct modelling of a domain, such incomplete ontologies, when used in semantically-enabled applications, can lead to valid conclusions being missed.ResultsWe consider the problem of repairing missing is-a relations in ontologies. We formalize the problem as a generalized TBox abduction problem. Based on this abduction framework, we present complexity results for the existence, relevance and necessity decision problems for the generalized TBox abduction problem with and without some specific preference relations for ontologies that can be represented using a member of the family of description logics. Further, we present algorithms for finding solutions, a system as well as experiments.ConclusionsSemantically-enabled applications need high quality ontologies and one key aspect is their completeness. We have introduced a framework and system that provides an environment for supporting domain experts to complete the is-a structure of ontologies. We have shown the usefulness of the approach in different experiments. For the two Anatomy ontologies from the Ontology Alignment Evaluation Initiative, we repaired 94 and 58 initial given missing is-a relations, respectively, and detected and repaired additionally, 47 and 10 missing is-a relations. In an experiment with BioTop without given missing is-a relations, we detected and repaired 40 new missing is-a relations.
In this paper we consider the problem of repairing missing is-a relations in ontologies. We formalize the problem as a generalized TBox abduction problem (GTAP). Based on this abduction framework, we present complexity results for the existence, relevance and necessity decision problems for the GTAP with and without some specific preference relations for ontologies that can be represented using a member of the EL family of description logics. Further, we present algorithms for finding solutions, a system as well as experiments.
Ontologies in the biomedical domain are becoming a key element for data integration and search. The usefulness of the applications which use ontologies is often directly influenced by the quality of ontologies, as incorrect or incomplete ontologies might lead to wrong or incomplete results for the applications. Therefore, there is an increasing need for repairing defects in ontologies.In this paper we focus on completing ontologies. We provide an algorithm for completing the is-a structure in EL ontologies which covers many biomedical ontologies. Further, we present an implemented system based on the algorithm as well as an evaluation using three biomedical ontologies.
Please cite this article in press as: F. Wei-Kleiner, Tree decomposition-based indexing for efficient shortest path and nearest neighbors query answering on graphs, J. Comput. Syst. Sci. (2015), http://dx. Highlights• We propose an indexing scheme for efficient graph query processing.• We introduce a decomposition algorithm which works for all graphs.• We design query answering algorithms for shortest path and kNN. AbstractWe propose TEDI, an indexing for solving shortest path, and k Nearest Neighbors (kNN) problems. TEDI is based on the tree decomposition methodology. The graph is first decomposed into a tree in which the node contains vertices. The shortest paths are stored in such nodes. These local shortest paths together with the tree structure constitute the index of the graph.Based on this index, algorithms can be executed to solve the shortest path, as well as the kNN problem more efficiently. Our experimental results show that TEDI offers orders-of-magnitude performance improvement over existing approaches on the index construction time, the index size and the query answering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.