Proceedings of the 10th Annual Joint Conference on Digital Libraries 2010
DOI: 10.1145/1816123.1816133
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating methods to rediscover missing web pages from the web infrastructure

Abstract: Missing web pages (pages that return the 404 "Page Not Found" error) are part of the browsing experience. The manual use of search engines to rediscover missing pages can be frustrating and unsuccessful. We compare four automated methods for rediscovering web pages. We extract the page's title, generate the page's lexical signature (LS), obtain the page's tags from the bookmarking website delicious.com and generate a LS from the page's link neighborhood. We use the output of all methods to query Internet searc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
12
0

Year Published

2011
2011
2014
2014

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 14 publications
(13 citation statements)
references
References 34 publications
1
12
0
Order By: Relevance
“…Even though they are expensive to compute, similar to tags, they may provide an alternative if no copy of a missing page can be found in the web infrastructure. Further research in [9,10] has shown that titles of web pages are a very strong alternative to lexical signatures. The results also prove that we can increase the retrieval performance by applying both methods combined.…”
Section: Content and Link Based Methods To Rediscover Web Pagesmentioning
confidence: 99%
See 4 more Smart Citations
“…Even though they are expensive to compute, similar to tags, they may provide an alternative if no copy of a missing page can be found in the web infrastructure. Further research in [9,10] has shown that titles of web pages are a very strong alternative to lexical signatures. The results also prove that we can increase the retrieval performance by applying both methods combined.…”
Section: Content and Link Based Methods To Rediscover Web Pagesmentioning
confidence: 99%
“…This establishes a binary relevance case. More precisely, similar to our evaluation in [9] the first performance measure distinguishes between four retrieval cases where the returned URI is:…”
Section: Performance Measurementioning
confidence: 99%
See 3 more Smart Citations