Proceedings of the First International Conference on Web Information Systems Engineering
DOI: 10.1109/wise.2000.882403
|View full text |Cite
|
Sign up to set email alerts
|

Concept hierarchy based text database categorization in a metasearch engine environment

Abstract: Document categorization as a technique to improve the retrieval of useful documents has been extensively investigated. One important issue in a large-scale metasearch engine is to select text databases that are likely to contain useful documents f o r a given query. We believe that database categorization can be a potentially effective technique f o r good database selection, especially in the Internet environment where short queries are usually submitted. In this paper, we propose and evaluate several databas… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
24
0

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 27 publications
(24 citation statements)
references
References 12 publications
0
24
0
Order By: Relevance
“…QProber classified Web-accessible databases by issuing a set of queries (query probes) to each database and analyzing the counts of the number of results to classify each database into a set of thematic categories [11]. Wang, Meng and Yu [27] used a similar approach, starting with the top 2 levels of the Yahoo! hierarchy.…”
Section: Offline Fast-feature Techniquesmentioning
confidence: 99%
“…QProber classified Web-accessible databases by issuing a set of queries (query probes) to each database and analyzing the counts of the number of results to classify each database into a set of thematic categories [11]. Wang, Meng and Yu [27] used a similar approach, starting with the top 2 levels of the Yahoo! hierarchy.…”
Section: Offline Fast-feature Techniquesmentioning
confidence: 99%
“…One of the first deep web crawlers for discovering and interacting with deep web databases was proposed in [32], where the complexity of interacting with web search interfaces was noted. More recently, there have been efforts to extract data from deep web databases [7] and to match deep web query interfaces [39,40,43], among others.…”
Section: Related Workmentioning
confidence: 99%
“…An alternative approach is taken by [2,4,11,21]. These algorithms obtain information by issuing queries to the information source and receiving document counts and/or document samples.…”
Section: Comparison Of Information Source Selection Approachesmentioning
confidence: 99%
“…A similar approach to [11] is taken by Wang et al [21] that does not rely on accurate document counts. However, it still requires a manually constructed topic tree, as well as a description for each topic, rather than document samples for each topic.…”
Section: Persistent Query Samplingmentioning
confidence: 99%