2021
DOI: 10.3390/app112110442
|View full text |Cite
|
Sign up to set email alerts
|

Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model

Abstract: This study aims to provide insights into the COVID-19-related communication on Twitter in the Republic of Croatia. For that purpose, we developed an NL-based framework that enables automatic analysis of a large dataset of tweets in the Croatian language. We collected and analysed 206,196 tweets related to COVID-19 and constructed a dataset of 10,000 tweets which we manually annotated with a sentiment label. We trained the Cro-CoV-cseBERT language model for the representation and clustering of tweets. Additiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
28
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2
1

Relationship

3
4

Authors

Journals

citations
Cited by 19 publications
(30 citation statements)
references
References 47 publications
2
28
0
Order By: Relevance
“…The seminal work of [41] contributed to the emergence of numerous variants of text representation models in terms of low-dimensional vectors in continuous space-embeddings, where embeddings allow semantically related linguistic units to be represented with similar vector representations. As described in [8] the first generation was characterised by shallow language models, such as Word2Vec [41], Doc2Vec [42], GloVe [43] and fastText [44]. They have some shortcomings, such as static embeddings in which multiple concepts (i.e., different meanings of the same entity, polysemy) are not represented by different embedding vectors, or poor performance in new domains.…”
Section: Text Featuresmentioning
confidence: 99%
See 3 more Smart Citations
“…The seminal work of [41] contributed to the emergence of numerous variants of text representation models in terms of low-dimensional vectors in continuous space-embeddings, where embeddings allow semantically related linguistic units to be represented with similar vector representations. As described in [8] the first generation was characterised by shallow language models, such as Word2Vec [41], Doc2Vec [42], GloVe [43] and fastText [44]. They have some shortcomings, such as static embeddings in which multiple concepts (i.e., different meanings of the same entity, polysemy) are not represented by different embedding vectors, or poor performance in new domains.…”
Section: Text Featuresmentioning
confidence: 99%
“…For example, the outbreak of the COVID-19 disease caused a significant increase in social media usage among the public and it seriously affected the public's understanding of the COVID-19 risk [7]. In some countries there were many negative attitudes toward vaccines and anti-pandemic measures promoted on social networks [8]. Therefore, information spreading analysis during the global crisis is of great importance as one step of social media monitoring (infoveillance).…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations
“…Most studies employed different NLP techniques for capturing specific aspects of the COVID-19 content published online. For discovering public perceptions, opinions, and attitudes toward specific COVID-19-related topics, researchers commonly combine topic modeling and sentiment analysis [21,[26][27][28], which are also occasionally combined with named entity recognition (NER) [29].…”
Section: Prior Workmentioning
confidence: 99%