Query translation on the fly in Deep Web integration

Jiang, Fangjiao; Jia, Linlin; Meng, Xiaofeng

doi:10.1007/s11859-007-0003-2

Cited by 4 publications

(2 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, this sampling method requires improvement in many respects, such as the sampling independence, the flexibility of the parameters configuration and the integrity of the results that reflect the database content [13][14][15] In addition, these random sampling methods cannot ensure consistency between the sampling data and the WDB data, such as the consistency of keywords in text attributes, the consistency of numeric distribution in numerical attributes and the integrity of text attributes. Moreover, these methods do not consider the dependencies among attributes in the WDB.…”

Section: Related Workmentioning

confidence: 99%

Web Database Sampling Based on Dependency of Keywords

Zhang¹,

Wei²,

Lin³

2015

TOCSJ

View full text Add to dashboard Cite

Abstract:The Information Era has witnessed a huge number of sources from websites. The abundance of useful data surrounding us has made it possible for integration systems to improve the quality of the integrated data. However, how to choose proper data sources efficiently to extract data with high coverage and low redundancy is still a hot topic in the area. Sampling the databases hiding behind the websites makes it possible to obtain the characteristics of the web databases, and further to choose appropriate sources when collecting data for integration and query optimization. In this paper we construct a sampling model to represent data characteristics of web databases based on posing keyword queries on the deep web query interface. The dependency of text attribute keywords within the data source is used to construct the dependent-relational probability matrix, which indicate the sample distribution and is used for keyword extension to fetch more sampling data and get new characteristics of the actual data. Further, we provide an efficiency method to evaluate the similarity between the sample databases and the real web databases. We evaluate the proposed method in real world dataset and the results show that our method can sample the web data sources with high similarity.

show abstract

Section: Related Workmentioning

confidence: 99%

Web Database Sampling Based on Dependency of Keywords

Zhang¹,

Wei²,

Lin³

2015

TOCSJ

View full text Add to dashboard Cite

show abstract

“…It was shown that the selected species of monocrystalline synthetic diamonds could be well dispersed in a pure aluminum-matrix. However, some ''microngrade'' diamond powders were particularly susceptible to the degradation during composite processing [6][7][8][9][10]. Natishan et al suggested that interfacial reaction in diamond/aluminum composites could be successfully suppressed by using a solid-state route, but the thermal conductivity was not addressed in this study [11].…”

Section: Introductionmentioning

confidence: 99%