Proceedings of the 8th ACM Conference on Web Science 2016
DOI: 10.1145/2908131.2908175
|View full text |Cite
|
Sign up to set email alerts
|

Analyzing web archives through topic and event focused sub-collections

Abstract: Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
6
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
3
1
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…The goal of the event-centric extraction process is, given an event input and a Web archive, generate an interlinked collection of documents relevant to the input event that meet the collection specification. The differences of our research with Gossen's previous work [12] transfer to this work. However, we adapted Gossen's categorization of events as either planned or unexpected, and we renamed planned to expected (Table 2).…”
Section: Acronym Post Count Author Countmentioning
confidence: 48%
See 1 more Smart Citation
“…The goal of the event-centric extraction process is, given an event input and a Web archive, generate an interlinked collection of documents relevant to the input event that meet the collection specification. The differences of our research with Gossen's previous work [12] transfer to this work. However, we adapted Gossen's categorization of events as either planned or unexpected, and we renamed planned to expected (Table 2).…”
Section: Acronym Post Count Author Countmentioning
confidence: 48%
“…Not all collection building uses focused crawling. Gossen et al [12] proposed a methodology for extracting sub-collections from Web archive collections focused on specific topics and events (called the topic and event focused sub-collection). The topic and event focused sub-collection is defined as a collection of documents in a Web archive collected using a sub-collection specification.…”
Section: Acronym Post Count Author Countmentioning
confidence: 99%
“…Thus, we reviewed the collection structures of Archive-It 7 , the National Library of Australia's (NLA) PANDORA 8 and Trove 9 archives, the Croatian Web Archive (HAW) 10 , the Library of Congress Web Archive (LC), the United Kingdom Web Archive (UKWA) 11 , Conifer 12 (formerly Webrecorder [27]). Finally, we include the Internet Archive's 13 (IA) user account web archives because IA's Wayback Machine is synonymous with web archiving, even though its collections are tied to a specific user rather than a theme. While there are many web archiving initiatives [12,42], we focused on these eight platforms because they provide collections as defined above.…”
Section: Introductionmentioning
confidence: 99%
“…Instead, automatic methods are needed that can extract collections of documents related to a particular event of user interest. These collections need to preserve the original link structure to achieve a high degree of authenticity and enable the application of analytical methods on the relevant parts of the Web archive [14].…”
Section: Introductionmentioning
confidence: 99%
“…In previous work [26,14] we have argued that these users are typically interested in studying smaller and more focused event-centric collections of documents contained in a Web archive. Such collections can reflect specific events such as elections, sports tournaments or natural disasters, for example the Fukushima nuclear disaster in 2011, the German federal election in 2009 or the FIFA World Cup 2006, especially in regard to their media coverage and public reactions.…”
Section: Introductionmentioning
confidence: 99%