Concept hierarchy based text database categorization in a metasearch engine environment

Wang, W.; Meng, Weiyi; Yu, Clement

doi:10.1109/wise.2000.882403

Cited by 27 publications

(24 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…QProber classified Web-accessible databases by issuing a set of queries (query probes) to each database and analyzing the counts of the number of results to classify each database into a set of thematic categories [11]. Wang, Meng and Yu [27] used a similar approach, starting with the top 2 levels of the Yahoo! hierarchy.…”

Section: Offline Fast-feature Techniquesmentioning

confidence: 99%

Categorizing web search results into meaningful and stable categories using fast-feature techniques

Kules

Kustanowitz

Shneiderman

2006

Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries

View full text Add to dashboard Cite

When search results against digital libraries and web resources have limited metadata, augmenting them with meaningful and stable category information can enable better overviews and support user exploration. This paper proposes six "fast-feature" techniques that use only features available in the search result list, such as title, snippet, and URL, to categorize results into meaningful categories. They use credible knowledge resources, including a US government organizational hierarchy, a thematic hierarchy from the Open Directory Project (ODP) web directory, and personal browse histories, to add valuable metadata to search results. In three tests the percent of results categorized for five representative queries was high enough to suggest practical benefits: general web search (76-90%), government web search (39-100%), and the Bureau of Labor Statistics website (48-94%). An additional test submitted 250 TREC queries to a search engine and successfully categorized 66% of the top 100 using the ODP and 61% of the top 350. Fast-feature techniques have been implemented in a prototype search engine. We propose research directions to improve categorization rates and make suggestions about how web site designers could re-organize their sites to support fast categorization of search results.

show abstract

Section: Offline Fast-feature Techniquesmentioning

confidence: 99%

Categorizing web search results into meaningful and stable categories using fast-feature techniques

Kules

Kustanowitz

Shneiderman

2006

Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries

View full text Add to dashboard Cite

show abstract

“…One of the first deep web crawlers for discovering and interacting with deep web databases was proposed in [32], where the complexity of interacting with web search interfaces was noted. More recently, there have been efforts to extract data from deep web databases [7] and to match deep web query interfaces [39,40,43], among others.…”

Section: Related Workmentioning

confidence: 99%

Discovering Interesting Relationships among Deep Web Databases: A Source-Biased Approach

2006

View full text Add to dashboard Cite

The escalation of deep web databases has been phenomenal over the last decade, spawning a growing interest in automated discovery of interesting relationships among available deep web databases. Unlike the Bsurface^web of static pages, these deep web databases provide data through a web-based query interface and account for a huge portion of all web content. This paper presents a novel sourcebiased approach to efficiently discover interesting relationships among web-enabled databases on the deep web. Our approach supports a relationship-centric view over a collection of deep web databases through source-biased database analysis and exploration. Our source-biased approach has three unique features: First, we develop source-biased probing techniques, which allow us to determine in very few interactions whether a target database is relevant to the source database by probing the target with very precise probes. Second, we introduce source-biased relevance metrics to evaluate the relevance of deep web databases discovered, to identify interesting types of source-biased relationships for a collection of deep web databases, and to rank them accordingly. The source-biased relationships discovered not only present value-added metadata for each deep web database but can also provide direct support for personalized relationship-centric queries. Third, but not least, we also develop a performance optimization using source-biased probing with focal terms to further improve the effectiveness of the basic source-biased model. A prototype system is designed for crawling, probing, and supporting relationshipcentric queries over deep web databases using the source-biased approach. Our experiments evaluate the effectiveness of the proposed source-biased analysis and discovery model, showing that the source-biased approach outperforms querybiased probing and unbiased probing.

show abstract

“…An alternative approach is taken by [2,4,11,21]. These algorithms obtain information by issuing queries to the information source and receiving document counts and/or document samples.…”

Section: Comparison Of Information Source Selection Approachesmentioning

confidence: 99%

“…A similar approach to [11] is taken by Wang et al [21] that does not rely on accurate document counts. However, it still requires a manually constructed topic tree, as well as a description for each topic, rather than document samples for each topic.…”

Section: Persistent Query Samplingmentioning

confidence: 99%

Information source selection for resource constrained environments

Aksoy

2005

SIGMOD Rec.

View full text Add to dashboard Cite

Distributed information retrieval has pressing scalability concerns due to the growing number of independent sources of on-line data and the emerging applications. A promising solution to distributed retrieval is metasearching, which dispatches a user's query to multiple sources and gathers the results into a single result set. An important component of metasearching is selecting the set of information sources most likely to provide relevant documents. Recent research has focused on how to obtain statistics for the selection task. In this paper we discuss different information source selection approaches and their applicability for resource-constrained sensor network applications.

show abstract

Concept hierarchy based text database categorization in a metasearch engine environment

Cited by 27 publications

References 12 publications

Categorizing web search results into meaningful and stable categories using fast-feature techniques

Categorizing web search results into meaningful and stable categories using fast-feature techniques

Discovering Interesting Relationships among Deep Web Databases: A Source-Biased Approach

Information source selection for resource constrained environments

Contact Info

Product

Resources

About