David Hawking scite author profile

Link-based ranking methods have been described in the literature and applied in commercial Web search engines. However, according to recent TREC experiments, they are no better than traditional content-based methods. We conduct a different type of experiment, in which the task is to find the main entry point of a specific Web site. In our experiments, ranking based on link anchor text is twice as effective as ranking based on document content, even though both methods used the same BM25 formula. We obtained these results using two sets of 100 queries on a 18.5 million document set and another set of 100 on a 0.4 million document set. This site finding effectiveness begins to explain why many search engines have adopted link methods. It also opens a rich new area for effectiveness improvement, where traditional methods fail.

show abstract

Fast generation of result snippets in web search

Turpin

et al. 2007

View full text Add to dashboard Cite

The presentation of query biased document snippets as part of results pages presented by search engines has become an expectation of search engine users. In this paper we explore the algorithms and data structures required as part of a search engine to allow efficient generation of query biased snippets. We begin by proposing and analysing a document compression method that reduces snippet generation time by 58% over a baseline using the zlib compression library. These experiments reveal that finding documents on secondary storage dominates the total cost of generating snippets, and so caching documents in RAM is essential for a fast snippet generation process. Using simulation, we examine snippet generation performance for different size RAM caches. Finally we propose and analyse document reordering and compaction, revealing a scheme that increases the number of document cache hits with only a marginal affect on snippet quality. This scheme effectively doubles the number of documents that can fit in a fixed size cache.

show abstract

Plans for the TREC-9 web track

Hawking

1999

SIGIR Forum

102

View full text Add to dashboard Cite

In recent years, TREC has broadened its scope to include many more facets of the Web searching process. In TREC-8 (1999), the Web special interest track evaluated link-based retrieval methods investigated differences between Web and traditional TREC ad hoc documents, and studied efficiency and effectiveness tradeoffs on large data sets. In addition, although neither used Web data, both the Cross-Lingual track and the Question & Answer track studied issues of considerable importance to everyday Web search.In TREC-9, the main Web track task will use a larger set of Web documents than last year and will use search topics derived from search engine logs. This task will be the closest approximation in TREC-9 to the traditional TREC Ad Hoc retrieval task.

show abstract

Engineering a multi-purpose test collection for Web retrieval experiments

Bailey

Craswell

Hawking

2003

Information Processing & Management

119

View full text Add to dashboard Cite

Results and challenges in Web search evaluation

et al. 1999

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

David Hawking

Effective site finding using link anchor information

Fast generation of result snippets in web search

Plans for the TREC-9 web track

Engineering a multi-purpose test collection for Web retrieval experiments

Results and challenges in Web search evaluation

Contact Info

Product

Resources

About