The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2017
DOI: 10.1007/978-3-319-67008-9_10
|View full text |Cite
|
Sign up to set email alerts
|

Extracting Event-Centric Document Collections from Large-Scale Web Archives

Abstract: Abstract. Web archives are typically very broad in scope and extremely large in scale. This makes data analysis appear daunting, especially for non-computer scientists. These collections constitute an increasingly important source for researchers in the social sciences, the historical sciences and journalists interested in studying past events. However, there are currently no access methods that help users to efficiently access information, in particular about specific events, beyond the retrieval of individua… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 13 publications
(34 citation statements)
references
References 22 publications
(27 reference statements)
0
34
0
Order By: Relevance
“…In our previous work we developed methods to infer missing categorical information in noisy and sparse Web markup data [31], increasing usefulness of this data for event-centric applications. Furthermore, we consider event-centric focused crawls from the Web [7] and Web archives [8], [9], as well as Twitter data regarding events and traffic. Another source of event-centric information is the recently proposed EventKG knowledge graph [10,11].…”
Section: Integration Of Web-based Mobility Datamentioning
confidence: 99%
“…In our previous work we developed methods to infer missing categorical information in noisy and sparse Web markup data [31], increasing usefulness of this data for event-centric applications. Furthermore, we consider event-centric focused crawls from the Web [7] and Web archives [8], [9], as well as Twitter data regarding events and traffic. Another source of event-centric information is the recently proposed EventKG knowledge graph [10,11].…”
Section: Integration Of Web-based Mobility Datamentioning
confidence: 99%
“…However, in this work, we leverage the judgment of humans on social media. In a similar work, Gossen et al [13] adapted some portions of the topic and event focused sub-collection in a method to extract event-centric documents from Web archives based on a specialized focused extraction algorithm. They defined two broad kinds of events based on time: planned and unexpected.…”
Section: Acronym Post Count Author Countmentioning
confidence: 99%
“…Therefore, the choice of queries was not arbitrary. Instead, we developed a temporal classification system (partly informed by Gossen et al [13]) of real world stories and events based on three temporal ( For the expectation attribute, an event may be expected or unexpected. For example, the Ebola outbreak event was unexpected.…”
Section: Topic Selectionmentioning
confidence: 99%
“…Previous work by Gossen et al [9] inspired this work. They developed a focused extraction (not web crawling) system to create event-centric collections from a large static archival collection stored on a server under their control.…”
Section: Related Workmentioning
confidence: 99%
“…Moreover, to the best of our knowledge, focused crawling across web archives has never been attempted. Inspired by previous work by Gossen et al [9], in this paper, we present a framework to build eventspecific collections by focused crawling of web archives. We utilize the Memento protocol [20] and the associated crossweb-archive infrastructure [3] to crawl Mementos in 22 web archives.…”
Section: Introductionmentioning
confidence: 99%