In this paper, we discuss how different types of automatic annotation of digitised newspaper articles can be integrated into the iterative questioning of the source material and the creation of research corpora out of a collection of unstructured texts (kept in a structured collection). We annotate a sizeable collection of Swiss press articles (183,270), extracted via the impresso interface 1 using topic modelling (MALLET) 2 as well as a naïve Bayes classifier (script by Milan van Lange).The methodological discussion we propose is to explore how text mining can help identify historical discourses that are difficult to query with keywords because of their inherent ambiguity and how to grasp them in a large corpus. We argue that the automated annotations can provide a body of corroborating evidence of the searched discourse, to be used as an intermediary and heuristic analysis step.
The automated enrichment of mass-digitised document collections using techniques such as text mining is becoming increasingly popular. Enriched collections offer new opportunities for interface design to allow data-driven and visualisation-based search, exploration and interpretation. Most such interfaces integrate close and distant reading and represent semantic, spatial, social or temporal relations, but often lack contrastive views. Inspect and Compare (I&C) contributes to the current state of the art in interface design for historical newspapers with highly versatile side-by-side comparisons of query results and curated article sets based on metadata and semantic enrichments. I&C takes search queries and pre-curated article sets as inputs and allows comparisons based on the distributions of newspaper titles, publication dates and automatically generated enrichments, such as language, article types, topics and named entities. Contrastive views of such data reveal patterns, help humanities scholars to improve search strategies and to facilitate a critical assessment of the overall data quality. I&C is part of the impresso interface for the exploration of digitised and semantically enriched historical newspapers.
Olgierd Górka was a historian specialized in Eastern and South-Eastern Europe who took actively part in the political debate concerning the place of minorities in Poland. He occupied different roles in the public sphere and appeared to have insistently tried to embody the voice of politically marginalised citizens of Poland. Olgierd Górka argued for a strong link between the Polish state and its citizens as a precondition for their mutual survival. His life exemplifies the discussion around the definition of the people, at the heart of the legitimation of modern nation-states in Central Europe during the 20th century. The debate initiated by Olgierd Górka helps to better understand how the modern Polish state, born from the ashes of three empires, defined Polish citizenship and how it evolved during the upheavals of the interwar and the post-war period.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.