Abstract. Web archives are typically very broad in scope and extremely large in scale. This makes data analysis appear daunting, especially for non-computer scientists. These collections constitute an increasingly important source for researchers in the social sciences, the historical sciences and journalists interested in studying past events. However, there are currently no access methods that help users to efficiently access information, in particular about specific events, beyond the retrieval of individual disconnected documents. Therefore we propose a novel method to extract event-centric document collections from large scale Web archives. This method relies on a specialized focused extraction algorithm. Our experiments on the German Web archive (covering a time period of 19 years) demonstrate that our method enables the extraction of event-centric collections for different event types.
In this paper, we describe a set of reusable text processing components for extracting opinionated information from social media, rating it for interestingness, and for detecting opinion events. We have developed applications in GATE to extract named entities, terms and events and to detect opinions about them, which are then used as the starting point for opinion event detection. The opinions are then aggregated over larger sections of text, to give some overall sentiment about topics and documents, and also some degree of information about interestingness based on opinion diversity. We go beyond traditional opinion mining techniques in a number of ways: by focusing on specific opinion-target extraction related to key terms and events, by examining and dealing with a number of specific linguistic phenomena, by analysing and visualising opinion dynamics over time, and by aggregating the opinions in different ways for a more flexible view of the information contained in the documents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.