Traditionally, skyline and ranking queries have been treated separately as alternative ways of discovering interesting data in potentially large datasets. While ranking queries adopt a specific scoring function to rank tuples, skyline queries return the set of non-dominated tuples and are independent of attribute scales and scoring functions. Ranking queries are thus less general, but usually cheaper to compute and widely used in data management systems.We propose a framework to seamlessly integrate these two approaches by introducing the notion of restricted skyline queries (R-skylines). We propose R-skyline operators that generalize both skyline and ranking queries by applying the notion of dominance to a set of scoring functions of interest. Such sets can be characterized, e.g., by imposing constraints on the function's parameters, such as the weights in a linear scoring function. We discuss the formal properties of these new operators, show how to implement them efficiently, and evaluate them on both synthetic and real datasets.
This paper investigates diversity queries over objects embedded in a low-dimensional vector space. An interesting case is provided by spatial Web objects, which are produced in great quantity by location-based services that let users attach content to places, and arise also in trip planning, news analysis, and real estate scenarios. The targeted queries aim at retrieving the best set of objects relevant to given user criteria and well distributed over a region of interest. Such queries are a particular case of diversified top-k queries, for which existing methods are too costly, as they evaluate diversity by accessing and scanning all relevant objects, even if only a small subset is needed. We therefore introduce Space Partitioning and Probing (SPP), an algorithm that minimizes the number of accessed objects while finding exactly the same result as MMR, the most popular diversification algorithm. SPP belongs to a family of algorithms that rely only on score-based and distance-based access methods, which are available in most geo-referenced Web data sources, and do not require retrieving all the relevant objects. Experiments show that SPP significantly reduces the number of accessed objects while incurring a very low computational overhead
Abstract. Access limitations may occur when querying data sources over the web or heterogeneous data sources presented as relational tables: this happens, for instance, in Data Exchange and Integration, Data Warehousing, and Web Information Systems. Access limitations force certain attributes to be selected in order to access the tables. It is known that evaluating a conjunctive query under such access restrictions amounts to evaluating a possibly recursive Datalog program. We address the problem of checking containment of conjunctive queries under access limitations, which is highly relevant in query optimization. Checking containment in such a setting would amount to checking containment of recursive Datalog programs of a certain class, while, for general Datalog programs, this problem is undecidable. We propose a decision procedure for query containment based on the novel notion of crayfish-chase, showing that containment can be decided in co-nexptime, which improves upon the known bound of 2exptime. Moreover, by means of a direct proof, our technique provides a new insight into the structure of the problem.
Abstract-All methods for efficient integrity checking require all integrity constraints to be totally satisfied, before any update is executed. However, a certain amount of inconsistency is the rule, rather than the exception in databases. In this paper, we close the gap between theory and practice of integrity checking, i.e., between the unrealistic theoretical requirement of total integrity and the practical need for inconsistency tolerance, which we define for integrity checking methods. We show that most of them can still be used to check whether updates preserve integrity, even if the current state is inconsistent. Inconsistency-tolerant integrity checking proves beneficial both for integrity preservation and query answering. Also, we show that it is useful for view updating, repairs, schema evolution and other applications.
Abstract-Data sources on the web are often accessible through web interfaces that present them as relational tables, but require certain attributes to be mandatorily selected, e.g., via a web form. In a scenario where we integrate a set of such sources, and we pose queries over them, the values needed to access a source may have to be retrieved from other sources that are possibly not even mentioned in the query: answering queries at best can then be done only with a potentially recursive query plan that gets all obtainable answers to the query. Since data sources are typically distributed over a network, a major cost indicator for the execution of a query plan is the number of accesses to remote sources. In this paper we present an optimization technique for conjunctive queries that produces a query plan that: (1) minimizes the number of accesses according to a strong notion of minimality; (2) excludes all sources that are not relevant for the query. We introduce Toorjah, a prototype system that answers queries posed on sources with limitations by means of optimized query plans. Toorjah adopts a strategy that is aimed to retrieve answers as early as possible during query processing, and to present them to the user as they are computed. We provide experimental evidence of the effectiveness of our optimization, by showing the reduction of the number of accesses in a large number of cases. I. INTRODUCTIONIn the context of integration of web data [1], or in a data exchange setting where source data are retrieved on the web, information is often accessible only via forms; it is easy to see that accessing data through a web form amounts to querying a relational table, where a selection is specified by the fields that are filled in. Typically, certain fields are required to be filled in by the user in order to obtain a result; for example, all online shops forbid a request posed by a user who leaves all fields of the search form empty. Analogously, in legacy systems where data are scattered over several files and wrapped as relational tables, similar limitations are enforced.Limitations on how sources can be accessed significantly complicate query processing: as shown, e.g., in [2], [3], query answering in the presence of access limitations in general requires the evaluation of a recursive query plan. This is shown in the following example.Example 1: Suppose we have three relations: r 1 (Artist, Nation, YOB ), which stores artists with their nationality and year of birth, and requires the first attribute to be selected; r 2 (Title, Year , Artist), which stores data
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.