Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management 2007
DOI: 10.1145/1316902.1316924
|View full text |Cite
|
Sign up to set email alerts
|

Using neighbors to date web documents

Abstract: Time has been successfully used as a feature in web information retrieval tasks. In this context, estimating a document's inception date or last update date is a necessary task. Classic approaches have used HTTP header fields to estimate a document's last update time. The main problem with this approach is that it is applicable to a small part of web documents. In this work, we evaluate an alternative strategy based on a document's neighborhood. Using a random sample containing 10,000 URLs from the Yahoo! Dire… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2010
2010
2018
2018

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(12 citation statements)
references
References 21 publications
(14 reference statements)
0
12
0
Order By: Relevance
“…Studies estimate that from 35% to 64% of web documents have valid last-modified dates [14], but these percentages can be significantly improved by using the dates of the web document's neighbors, especially of web resources embedded in the selected document (e.g. images, CSS, JavaScript) [25]. Nevertheless, for simplification, in this work we adopted the crawling date.…”
Section: Temporal Intervalsmentioning
confidence: 99%
“…Studies estimate that from 35% to 64% of web documents have valid last-modified dates [14], but these percentages can be significantly improved by using the dates of the web document's neighbors, especially of web resources embedded in the selected document (e.g. images, CSS, JavaScript) [25]. Nevertheless, for simplification, in this work we adopted the crawling date.…”
Section: Temporal Intervalsmentioning
confidence: 99%
“…We will discuss in details the temporal language model in the next section. Non-learning methods are presented in [78,82,94]. They require an explicit timetagged document.…”
Section: Related Workmentioning
confidence: 99%
“…In the end, the event-time period of the document is generated by assembling all nearly dates to the reference date where their relevancy must be greater than a threshold. Nunes et al [94] propose an alternative approach to dating a non-timestamped document using its neighbors, such as 1) documents containing links to the non-timestamped document (incoming links), 2) documents pointed to the non-timestamped document (outgoing links) and 3) the media assets (e.g., images) associated with the non-timestamped document. They compute the average of last-modified dates extracted from neighbor documents and use it as the time for the non-timestamped document.…”
Section: Related Workmentioning
confidence: 99%
“…[5] proposed a novel technique for the estimation of last modification date of a Web page, which complements the traditional approaches based on HTTP headers, by also looking at a document's neighborhood. The neighborhood of a document, as mentioned in the paper, is its incoming links, outgoing links and media files, e.g., pdf files, image files, sound files etc.…”
Section: Neighborhood Techniquementioning
confidence: 99%