Abstract. Though skyline queries already have claimed their place in retrieval over central databases, their application in Web information systems up to now was impossible due to the distributed aspect of retrieval over Web sources. But due to the amount, variety and volatile nature of information accessible over the Internet extended query capabilities are crucial. We show how to efficiently perform distributed skyline queries and thus essentially extend the expressiveness of querying today's Web information systems. Together with our innovative retrieval algorithm we also present useful heuristics to further speed up the retrieval in most practical cases paving the road towards meeting even the realtime challenges of on-line information services. We discuss performance evaluations and point to open problems in the concept and application of skylining in modern information systems. For the curse of dimensionality, an intrinsic problem in skyline queries, we propose a novel sampling scheme that allows to get an early impression of the skyline for subsequent query refinement.
Query processing in database systems has developed beyond mere exact matching of attribute values. Scoring database objects and retrieving only the top k matches or Pareto-optimal result sets (skyline queries) are already common for a variety of applications. Specialized algorithms using either paradigm can avoid naïve linear database scans and thus improve scalability. However, these paradigms are only two extreme cases of exploring viable compromises for each user's objectives. To find the correct result set for arbitrary cases of multi-objective query processing in databases we will present a novel algorithm for computing sets of objects that are non-dominated with respect to a set of monotonic objective functions. Naturally containing top k and skyline retrieval paradigms as special cases, this algorithm maintains scalability also for all cases in between. Moreover, we will show the algorithm's correctness and instance-optimality in terms of necessary object accesses and how the response behavior can be improved by progressively producing result objects as quickly as possible, while the algorithm is still running.
Abstract. Generalized association rules are a very important extension of boolean association rules, but with current approaches mining generalized rules is computationally very expensive. Especially when considering the rule generation as being part of an interactive KDD-process this becomes annoying. In this paper we discuss strengths and weaknesses of known approaches to generate frequent itemsets. Based on the insights we derive a new algorithm, called Prutax, to mine generalized frequent itemsets. The basic ideas of the algorithm and further optimisation are described. Experiments with both synthetic and real-life data show that Prutax is an order of magnitude faster than previous approaches.
Query processing in database systems has developed beyond mere exact matching of attribute values. Scoring database objects and retrieving only the top k matches or Pareto-optimal result sets (skyline queries) are already common for a variety of applications. Specialized algorithms using either paradigm can avoid naïve linear database scans and thus improve scalability. However, these paradigms are only two extreme cases of exploring viable compromises for each user's objectives. To find the correct result set for arbitrary cases of multi-objective query processing in databases we will present a novel algorithm for computing sets of objects that are non-dominated with respect to a set of monotonic objective functions. Naturally containing top k and skyline retrieval paradigms as special cases, this algorithm maintains scalability also for all cases in between. Moreover, we will show the algorithm's correctness and instance-optimality in terms of necessary object accesses and how the response behavior can be improved by progressively producing result objects as quickly as possible, while the algorithm is still running.
Unlike numerical preferences, preferences on attribute values do not show an inherent total order, but skyline computation has to rely on partial orderings explicitly stated by the user. In such orders many object values are incomparable, hence skylines sizes become unpractical. However, the Pareto semantics can be modified to benefit from indifferences: skyline result sizes can be essentially reduced by allowing the user to declare some incomparable values as equally desirable. A major problem of adding such equivalences is that they may result in intransitivity of the aggregated Pareto order and thus efficient query processing is hampered. In this paper we analyze how far the strict Pareto semantics can be relaxed while always retaining transitivity of the induced Pareto aggregation. Extensive practical tests show that skyline sizes can indeed be reduced about two orders of magnitude when using the maximum possible relaxation still guaranteeing the consistency with all user preferences.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.