The Rule Query Language (RQL) is an SQL-like pattern mining language that extends and generalizes functional dependencies to new and unexpected rules. It brings to the data analysts' desktop a convenient tool to discover logical implications between attributes of the database. Such implications may reveal data quality problems or surprising correlations between attributes over some part of the database. The computation of RQL queries is based on a query rewriting technique that pushes as much processing as possible to the underlying DBMS. This contribution is an attempt to bridge the gap between pattern mining and databases and facilitates the use of data mining techniques by SQL-aware analysts and students.
Recently, several large Knowledge Bases (KBs) have been constructed by mining the Web for information. As an increasing amount of inconsistent and non-reliable data are available, KBs facts may be uncertain and are then associated with an explicit certainty degree. When querying these uncertain KBs, users seek high quality results i.e., results that have a certainty degree greater than a given threshold α. However, as they usually have only a partial knowledge of the KBs contents, their queries may be failing i.e., they return no result for the desired certainty. To prevent this frustrating situation, instead of returning an empty set of answers, our approach explains the reasons of the failure with a set of αMinimal Failing Subqueries (αMFSs), and computes alternative relaxed queries, called αMaXimal Succeeding Subqueries (αXSSs), that are as close as possible to the initial failing query. Moreover, as the user may not always be able to provide an appropriate threshold α, we propose two algorithms to compute the αMFSs and αXSSs for other thresholds. Our experiments on the WatDiv benchmark show the relevance of our algorithms compared to a baseline method.
At EDF, a leading energy company, process data produced in power stations are archived both to comply with legal archiving requirements and to perform various analysis applications. Such data consist of timestamped measurements, retrieved for the most part from process data acquisition systems. After archival, past and current values are used for various applications, including device monitoring, maintenance assistance, decision support, statistics publication, etc. Large amounts of data are generated in these power stations, and aggregated in soft real-time -without operational deadlines -at the plant level by local servers. For this long-term data archiving, EDF relies on data historians -like InfoPlus.21, PI or Wonderware Historian -for years. This is also true for other energy companies worldwide and, in general, industry based on automated processes. In this paper, we aim at answering a simple, yet not so easy, question: how can data historians be placed in the data management landscape, from classical RDBMSs to NoSQL systems? To answer this question, we first give an overview of data historians, then discuss benchmarking these particular systems. Although many benchmarks are defined for conventional database management systems, none of them are appropriate for data historians. To establish a first objective basis for comparison, we therefore propose a simple benchmark inspired by EDF use cases, and give experimental results for data historians and DBMSs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.