Current approaches for answering queries with imprecise constraints require users to provide distance metrics and importance measures for attributes of interest. In this paper we focus on providing a domain and end-user independent solution for supporting imprecise queries over Web databases without affecting the underlying database. We propose a query processing framework that integrates techniques from IR and database research to efficiently determine answers for imprecise queries. We mine and use approximate functional dependencies between attributes to create precise queries having tuples relevant to the given imprecise query. An approach to automatically estimate the semantic distances between values of categorical attributes is also proposed. We provide preliminary results showing the utility of our approach.
Extracting information and insights from large databases is a time-consuming activity and has received considerable research attention recently. In this demo, we present DynaCet-a domain independent system that provides effective minimum-effort based dynamic faceted search solutions over enterprise databases. At every step, Dynacet suggests facets depending on the user response in the previous step. Facets are selected based on their ability to rapidly drill down to the most promising tuples, as well as on the ability of the user to provide desired values for them. The benefits provided include faster access to information stored in databases while taking into consideration the variance in user knowledge and preferences.
It is possible to obtain fine grained location information fairly easily using Global Positioning System (GPS) enabled devices. It becomes easy to track an individual's location and trace her trajectory using such devices. By aggregating this data and analyzing multiple users' trajectory a lot of useful information can be extracted. In this paper, we aim to analyze aggregate GPS information of multiple users to mine a list of interesting locations and rank them. By interesting locations we mean the geographical locations visited by several users. It can be an office, university, historical place, a good restaurant, a shopping complex, a stadium, etc. To achieve this various relational algebra operations and statistical operations are applied on the GPS trajectory data of multiple users. The end result is a ranked list of interesting locations. We show the results of applying our methods on a large real life GPS dataset of sixty two users collected over a period of two years.
Abstract. In this paper we describe two optimization techniques that are specially tailored for information gathering. The first is a greedy minimization algorithm that minimizes an information gathering plan by removing redundant and overlapping information sources without loss of completeness. We then discuss a set of heuristics that guide the greedy minimization algorithm so as to remove costlier information sources first. In contrast to previous work, our approach can handle recursive query plans that arise commonly in the presence of constrained sources. Second, we present a method for ordering the access to sources to reduce the execution cost. This problem differs significantly from the traditional database query optimization problem as sources on the Internet have a variety of access limitations and the execution cost in information gathering is affected both by network traffic and by the connection setup costs. Furthermore, because of the autonomous and decentralized nature of the Web, very little cost statistics about the sources may be available. In this paper, we propose a heuristic algorithm for ordering source calls that takes these constraints into account. Specifically, our algorithm takes both access costs and traffic costs into account, and is able to operate with very coarse statistics about sources (i.e., without depending on full source statistics). Finally, we will discuss implementation and empirical evaluation of these methods in Emerac, our prototype information gathering system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.