Abstract:To facilitate users to access the desired information, many researches have dedicated to the Deep Web (i.e. Web databases) integration. We focus on query translation which is an important part of the Deep Web integration. Our aim is to construct automatically a set of constraints mapping rules so that the system can translate the query from the integrated interface to the Web database interfaces based on them. In this paper, we construct a concept hierarchy for the attributes of the query interfaces, especially, store the synonyms and the types (e.g. IntroductionIn recent years, users could access the interesting information from Deep Web (i.e. Web databases) which has been developed rapidly. As reported [1] , there are 96,000 web sites and 550 billion hidden pages in the Deep Web, which is 500 times more than the Surface Web. To help the users find the desired information from the Deep Web, many researchers carried out their research works on the Deep Web integration. But these researches mainly focus on interface integration and result page extraction, some of which address the Web databases crawling, discovery and classification. A few efforts are dedicated to query translation which has responsibility for translating a user's query from the integrated interface to the web database interfaces.Research works related to query translation have mainly fallen into two categories: attribute mapping and constraints mapping.Attribute mapping [2,3,4,5,6,7] in Deep Web integration has been extensively researched. These works can be classified into two different methods, one [2,3] is schema-based method, and the other [4,5] is instance-based method. The paper [2] takes a conceptually novel approach by viewing schema matching as correlation mining and proposed a new correlation measure, H-measure, to find the mapping attributes. The paper [3] utilizes statistics technology other than data mining one. The paper [4] proposes an interactive, clustering-based approach and the paper [5] proposes a data-ensemble framework with sampling and voting techniques, respectively. Instance-based methods have been employed in many schema matching tasks [6,7]. The paper [6] addresses two significant schema matching problems: intra-site and inter-site. WebIQ [7] proposes a solution that learns from both the Surface Web and the Deep Web to automatically discover instances for interface attributes. Representational research works on constraints mapping are [8] and [9]. The paper [8] applies user provided mapping rules to translate query. In the paper [9], the approach dynamically mapping predicates across unseen sources is proposed.Due to the autonomous, heterogeneous, dynamic and scalable nature of the Deep Web, query translation will be a
Due to the autonomy of web databases, a major challenge for query translation in a Deep Web Data Integration System is the lack of cost models at the global level. In this paper, we propose a Multiple-regression Cost Model (MrCoM) based on statistical analysis for global range queries that involve numeric range attributes. Using the MrCoM, the query translation strategy for new global range queries can be inferred. We also propose a Preprocessing-based Stepwise Algorithm (PSA) for selecting significant independent variables into the MrCoM. Experimental results demonstrate that the fitness of the MrCoM is good and the accuracy of the query strategy selection is high.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.