Data quality assessment and data cleaning are context-dependent activities. Motivated by this observation, we propose the Ontological Multidimensional Data Model (OMD model), which can be used to model and represent contexts as logic-based ontologies. e data under assessment is mapped into the context, for additional analysis, processing, and quality data extraction. e resulting contexts allow for the representation of dimensions, and multidimensional data quality assessment becomes possible. At the core of a multidimensional context we include a generalized multidimensional data model and a Datalog ± ontology with provably good properties in terms of query answering.ese main components are used to represent dimension hierarchies, dimensional constraints, dimensional rules, and de ne predicates for quality data speci cation. ery answering relies upon and triggers navigation through dimension hierarchies, and becomes the basic tool for the extraction of quality data. e OMD model is interesting per se, beyond applications to data quality. It allows for a logic-based, and computationally tractable representation of multidimensional data, extending previous multidimensional data models with additional expressive power and functionalities.
Weakly-sticky (WS ) Datalog ± is an expressive member of the family of Datalog ± programs that is based on the syntactic notions of stickiness and weak-acyclicity. Query answering over the WS programs has been investigated, but there is still much work to do on the design and implementation of practical query answering (QA) algorithms and their optimizations. Here, we study sticky and WS programs from the point of view of the behavior of the chase procedure, extending the stickiness property of the chase to that of generalized stickiness of the chase (gschproperty). With this property we specify the semantic class of GSCh programs, which includes sticky and WS programs, and other syntactic subclasses that we identify. In particular, we introduce joint-weakly-sticky (JWS ) programs, that include WS programs. We also propose a bottomup QA algorithm for a range of subclasses of GSCh. The algorithm runs in polynomial time (in data) for JWS programs. Unlike the WS class, JWS is closed under a general magic-sets rewriting procedure for the optimization of programs with existential rules. We apply the magicsets rewriting in combination with the proposed QA algorithm for the optimization of QA over JWS programs.
Data quality and data cleaning are context dependent activities. Starting from this observation, in previous work a context model for the assessment of the quality of a database instance was proposed. In that framework, the context takes the form of a possibly virtual database or data integration system into which a database instance under quality assessment is mapped, for additional analysis and processing, enabling quality assessment. In this work we extend contexts with dimensions, and by doing so, we make possible a multidimensional assessment of data quality assessment. Multidimensional contexts are represented as ontologies written in Datalog±. We use this language for representing dimensional constraints, and dimensional rules, and also for doing query answering based on dimensional navigation, which becomes an important auxiliary activity in the assessment of data. We show ideas and mechanisms by means of examples. TABLE I Measurements Time Patient Value
Datalog ± is a family of ontology languages that combine good computational properties with high expressive power. Datalog ± languages are provably able to capture many relevant Semantic Web languages. In this paper we consider the class of weakly-sticky (WS) Datalog ± programs, which allow for certain useful forms of joins in rule bodies as well as extending the well-known class of weakly-acyclic TGDs. So far, only nondeterministic algorithms were known for answering queries on WS Datalog ± programs. We present novel deterministic query answering algorithms under WS Datalog ± . In particular, we propose: (1) a bottom-up grounding algorithm based on a query-driven chase, and (2) a hybrid approach based on transforming a WS program into a so-called sticky one, for which query rewriting techniques are known. We discuss how our algorithms can be optimized and effectively applied for query answering in real-world scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.