“…Only recently, scholars, often in interdisciplinary teams of historians, programmers and data scientists, have started to tap into the vast collections of digitized historical news texts. We agree with the growing body of literature on computational methods in journalism research that forms of automated content analysis, specifically machine learning approaches, offer promising venues to analyse big data sets of news content and introduce new questions and approaches to journalism studies (Boumans and Trilling 2016;Flaounas et al 2013; G€ unther and Quandt 2016; Jacobi, van Atteveldt, and Welbers 2016; Burschers, Vliegenthart, and de Vreese 2015). It allows for grounding analyses in big data and mapping the structural transformation of journalistic discourse on a large scale.…”
supporting
confidence: 78%
“…The key question is, therefore, whether and how digitization will actually change research practices in journalism history (cf. Boumans and Trilling 2016;Flaounas et al 2013). Despite the development of (computer-assisted) social scientific ways of research such as (automatic) quantitative content analysis that offer the opportunity to explore news content beyond ideographic and myopic studies, journalism historians have been reluctant in adopting quantitative and computational methods (Wijfjes 2017;Nicholson 2013;Broersma 2011aBroersma , 2011b.…”
mentioning
confidence: 99%
“…We join in calls for journalism scholars to move beyond keyword search and manual content analysis and take full advantage of the available digitized newspaper material (Boumans and Trilling 2016;Flaounas et al 2013; G€ unther and Quandt 2016;…”
mentioning
confidence: 99%
“…EXPLORING MACHINE LEARNING Jacobi, Van Atteveldt, and Welbers 2016; Burscher, Vliegenthart, and de Vreese 2015). Computational methods based on machine learning enable us to root analyses in large data sets instead of necessarily modest samples (Boumans and Trilling 2016;Broersma 2011aBroersma , 2011bWijfjes 2017). This implies that "we no longer have to choose between data size and data depth" (Manovich 2012, 466).…”
mentioning
confidence: 99%
“…text statistics, sentiment analysis, topic modelling or frame analysis, facilitates detailed analyses of newspapers as a serial source on an unprecedented scale in a much more cost-efficient way (cf. Boumans and Trilling 2016).…”
The labour-intensive nature of manual content analysis and the problematic accessibility of source material make quantitative analyses of news content still scarce in journalism history. However, the digitization of newspaper archives now allows for innovative digital methods for systematic longitudinal research beyond the scope of incidental case studies. We argue that supervised machine learning offers promising approaches to analyse abundant source material, ground analyses in big data, and map the structural transformation of journalistic discourse longitudinally. By automatically analysing form and style conventions, that reflect underlying professional norms and practices, the structure of news coverage can be studied more closely. However, automatically classifying latent and period-specific coding categories is highly complex. The structure of digital newspaper archives (e.g. segmentation, OCR) complicates this even more, while machine learning algorithms are often a black box. This paper shows how making classification processes transparent enables journalism scholars to employ these computational methods in a reliable and valid way. We illustrate this by focusing on the issues we encountered with automatically classifying news genres, an illuminating but particularly complex coding category. Ultimately, such an approach could foster a revision of journalism history, particularly the often hypothesized but understudied shift from opinion-based to fact-centred reporting.
“…Only recently, scholars, often in interdisciplinary teams of historians, programmers and data scientists, have started to tap into the vast collections of digitized historical news texts. We agree with the growing body of literature on computational methods in journalism research that forms of automated content analysis, specifically machine learning approaches, offer promising venues to analyse big data sets of news content and introduce new questions and approaches to journalism studies (Boumans and Trilling 2016;Flaounas et al 2013; G€ unther and Quandt 2016; Jacobi, van Atteveldt, and Welbers 2016; Burschers, Vliegenthart, and de Vreese 2015). It allows for grounding analyses in big data and mapping the structural transformation of journalistic discourse on a large scale.…”
supporting
confidence: 78%
“…The key question is, therefore, whether and how digitization will actually change research practices in journalism history (cf. Boumans and Trilling 2016;Flaounas et al 2013). Despite the development of (computer-assisted) social scientific ways of research such as (automatic) quantitative content analysis that offer the opportunity to explore news content beyond ideographic and myopic studies, journalism historians have been reluctant in adopting quantitative and computational methods (Wijfjes 2017;Nicholson 2013;Broersma 2011aBroersma , 2011b.…”
mentioning
confidence: 99%
“…We join in calls for journalism scholars to move beyond keyword search and manual content analysis and take full advantage of the available digitized newspaper material (Boumans and Trilling 2016;Flaounas et al 2013; G€ unther and Quandt 2016;…”
mentioning
confidence: 99%
“…EXPLORING MACHINE LEARNING Jacobi, Van Atteveldt, and Welbers 2016; Burscher, Vliegenthart, and de Vreese 2015). Computational methods based on machine learning enable us to root analyses in large data sets instead of necessarily modest samples (Boumans and Trilling 2016;Broersma 2011aBroersma , 2011bWijfjes 2017). This implies that "we no longer have to choose between data size and data depth" (Manovich 2012, 466).…”
mentioning
confidence: 99%
“…text statistics, sentiment analysis, topic modelling or frame analysis, facilitates detailed analyses of newspapers as a serial source on an unprecedented scale in a much more cost-efficient way (cf. Boumans and Trilling 2016).…”
The labour-intensive nature of manual content analysis and the problematic accessibility of source material make quantitative analyses of news content still scarce in journalism history. However, the digitization of newspaper archives now allows for innovative digital methods for systematic longitudinal research beyond the scope of incidental case studies. We argue that supervised machine learning offers promising approaches to analyse abundant source material, ground analyses in big data, and map the structural transformation of journalistic discourse longitudinally. By automatically analysing form and style conventions, that reflect underlying professional norms and practices, the structure of news coverage can be studied more closely. However, automatically classifying latent and period-specific coding categories is highly complex. The structure of digital newspaper archives (e.g. segmentation, OCR) complicates this even more, while machine learning algorithms are often a black box. This paper shows how making classification processes transparent enables journalism scholars to employ these computational methods in a reliable and valid way. We illustrate this by focusing on the issues we encountered with automatically classifying news genres, an illuminating but particularly complex coding category. Ultimately, such an approach could foster a revision of journalism history, particularly the often hypothesized but understudied shift from opinion-based to fact-centred reporting.
Computer-aided text analysis (CATA) offers exciting new possibilities for conflict research that this contribution describes using a range of exemplary studies from a variety of disciplines including sociology, political science, communication studies, and computer science. The chapter synthesizes empirical research that investigates conflict in relation to text across different formats and genres. This includes both conflict as it is verbalized in the news media, in political speeches, and other public documents and conflict as it occurs in online spaces (social media platforms, forums) and that is largely confined to such spaces (e.g., flaming and trolling). Particular emphasis is placed on research that aims to find commonalities between online and offline conflict, and that systematically investigates the dynamics of group behavior. Both work using inductive computational procedures, such as topic modeling, and supervised machine learning approaches are assessed, as are more traditional forms of content analysis, such as dictionaries. Finally, crossvalidation is highlighted as a crucial step in CATA, in order to make the method as useful as possible to scholars interested in enlisting text mining for conflict research.
Automated sentiment analysis of textual data is one of the central and most challenging tasks in political communication studies. However, the toolkits available are primarily for English texts and require contextual adaptation to produce valid results—especially concerning morphologically rich languages such as Hungarian. This study introduces (1) a new sentiment and emotion annotation framework that uses inductive approaches to identify emotions in the corpus and aggregate these emotions into positive, negative, and mixed sentiment categories, (2) a manually annotated sentiment data set with 5700 political news sentences, (3) a new Hungarian sentiment dictionary for political text analysis created via word embeddings, whose performance was compared with other available sentiment dictionaries. (4) Because of the limitations of sentiment analysis using dictionaries we have also applied various machine learning algorithms to analyze our dataset, (5) Last but not least to move towards state-of-the-art approaches, we have fine-tuned the Hungarian BERT-base model for sentiment analysis. Meanwhile, we have also tested how different pre-processing steps could affect the performance of machine-learning algorithms in the case of Hungarian texts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.