This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206,196 tweets related to COVID-19 and constructed a dataset of 10,000 tweets which we manually annotated with a sentiment label. We trained the Cro-CoV-cseBERT language model for the representation and clustering of tweets. Additionally, we compared the performance of four machine learning algorithms on the task of sentiment classification. After identifying the best performing setup of NLP methods, we applied the proposed framework in the task of characterisation of COVID-19 tweets in Croatia. More precisely, we performed sentiment analysis and tracked the sentiment over time. Furthermore, we detected how tweets are grouped into clusters with similar themes across three pandemic waves. Additionally, we characterised the tweets by analysing the distribution of sentiment polarity (in each thematic cluster and over time) and the number of retweets (in each thematic cluster and sentiment class). These results could be useful for additional research and interpretation in the domains of sociology, psychology or other sciences, as well as for the authorities, who could use them to address crisis communication problems.
Background Online media play an important role in public health emergencies and serve as essential communication platforms. Infoveillance of online media during the COVID-19 pandemic is an important step toward gaining a better understanding of crisis communication. Objective The goal of this study was to perform a longitudinal analysis of the COVID-19–related content on online media based on natural language processing. Methods We collected a data set of news articles published by Croatian online media during the first 13 months of the pandemic. First, we tested the correlations between the number of articles and the number of new daily COVID-19 cases. Second, we analyzed the content by extracting the most frequent terms and applied the Jaccard similarity coefficient. Third, we compared the occurrence of the pandemic-related terms during the two waves of the pandemic. Finally, we applied named entity recognition to extract the most frequent entities and tracked the dynamics of changes during the observation period. Results The results showed no significant correlation between the number of articles and the number of new daily COVID-19 cases. Furthermore, there were high overlaps in the terminology used in all articles published during the pandemic with a slight shift in the pandemic-related terms between the first and the second waves. Finally, the findings indicate that the most influential entities have lower overlaps for the identified people and higher overlaps for locations and institutions. Conclusions Our study shows that online media have a prompt response to the pandemic with a large number of COVID-19–related articles. There was a high overlap in the frequently used terms across the first 13 months, which may indicate the narrow focus of reporting in certain periods. However, the pandemic-related terminology is well-covered.
We investigate a particular subclass of semantic features associated with demonstratives of quantity and quality, and the respective interrogatives. We explore these lexical elements as they behave linguistically in simile constructions in Croatian. We show that the semantic features that can be focused on in similes are the same as the references of the full system of Croatian demonstratives. However, the possibility to focus on a particular semantic feature by a specific choice of demonstrative or interrogative in simile constructions is being replaced in usage by the generic simile word ‘kao’.
Jezik i pravopis: teorijsko-metodološki pristup pravopisnom normiranjuOdnos je jezičnih razina i njihovih normi prema pravopisnom planu zacrtan u tradiciji izrade hrvatskih pravopisnih knjiga još od pravopisne studije M. Kušara (Nauka o pravopisu jezika hrvackoga ili srpskoga (fonetičkom i etimologijskom), Dubrovnik 1889. godine). Tom se raspravom, prema vlastitu priznanju, u velike koristovao Ivan Broz u pisanju Hrvatskoga pravopisa ( 1 1892. godine), a zahvaljujući polustoljetnoj Broz-Boranićevoj pravopisnoj dominaciji, baštinjeni se pristup duboko ukorijenio u metodologiji izrade hrvatskih pravopisnih priručnika.Znakovito je da su se u dvadesetostoljetnim pokušajima da se pravopisna norma promijeni (i to onima motiviranima ponajviše društveno-političkim razlozima) mijenjala samo pojedina pravopisna rješenja i/ili pravopisne koncepcije, a ne i (znatnije) metodologija izrade pravopisnih knjiga. S jedne strane cijenjena kao hrvatska i tradicijska, s druge nedostatno propitana, ta je metodologija naslijeđena i u izdanjima pravopisa S. Babića, B. Finke i M. Moguša. Ipak, polovicom 80-ih godina 20. stoljeća zamjetan metodološki pomak donosi Anić-Silićev pravopisni priručnik -u prvome redu pravopisna se norma lišava onoga nepravopisnoga. Međutim nova je koncepcija pravopisne knjige potakla i nova pitanja: pitanje stava prema tradiciji izrade pravopisnih knjiga, ali i ono dotad neprepoznato o komunikativnosti pravopisne knjige.Kao normativno područje na kojem se susreću struka (koju predstavlja pravopisac) i korisnik koji nije nužno filolog, pravopis će u budućnosti morati odgovoriti na pitanja kako uskladiti jedno i drugo, tj. kako unaprijediti pravopisni propis i razviti popularnoznanstveni diskurs u svrhu podizanja opće pravopismenosti na hrvatskome standardnom jeziku.KLJUČNE RIJEČI: hrvatski standardni jezik, normiranje, pravopis. Polazišne postavkePravopisni je plan stjecište svih jezičnih razina.1 Pravopisna norma nužno mora uzeti u obzir norme svake pojedine jezične razine. To, dakako, ne znači da je pravopisna norma tek zbirka, svojevrsni složenac sačinjen od odabranih, 1 Ovdje odmah najavljujemo da ćemo pojmovno-terminološki razlikovati jezični plan i jezičnu razinu. Naime za razliku od fonološke, morfološke, leksičke, sintaktičke i semantičke jezične razine -čije se uočavanje temelji na strukturalističkome poimanju jezika, tj. na članjivosti i hijerarhijskoj organiziranosti elemenata njegovih struktura -u slučaju ćemo govorenja i pisanja govoriti o planovima jezične realizacije (o jezičnim
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.