Design and analysis of a large-scale COVID-19 tweets dataset

Lamsal, Rabindra

doi:10.1007/s10489-020-02029-z

Cited by 152 publications

(99 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The freely available dataset contained global tweets which were mostly geotagged and filtered using keywords related to COVID-19 such as “ corona ”, “ coronavirus ”, “ coronavirus ” until April 17, 2020. After April 18, 2020, additional filtering keywords including “ covid ”, “ #covid ”, “ covid19 ”, “ covid19 ”, “ covid-19 ”, “ covid-19 ”, “ sarscov2 ”, “ sarscov2 ”, “ sars cov2 ”, “ sars cov 2 ”, “ covid_19 ”, “ covid_19 ”, “ ncov ”, “ ncov2019 ”, “ ncov2019 ”, “ 2019- ncov ”, “ 2019-ncov ”,“ #2019ncov ”, “ 2019ncov ” were added to the tweet dataset [ 29 ]. This freely available tweet data contained only the tweet IDs of the users since the Twitter policy does not provide access to streaming complete tweets and publish to third parties.…”

Section: Methodsmentioning

confidence: 99%

Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter Data

Shofiya

Abidi

2021

IJERPH

View full text Add to dashboard Cite

Background: COVID-19 preventive measures have been an obstacle to millions of people around the world, influencing not only their normal day-to-day activities but also affecting their mental health. Social distancing is one such preventive measure. People express their opinions freely through social media platforms like Twitter, which can be shared among other users. The articulated texts from Twitter can be analyzed to find the sentiments of the public concerning social distancing. Objective: To understand and analyze public sentiments towards social distancing as articulated in Twitter textual data. Methods: Twitter data specific to Canada and texts comprising social distancing keywords were extrapolated, followed by utilizing the SentiStrength tool to extricate sentiment polarity of tweet texts. Thereafter, the support vector machine (SVM) algorithm was employed for sentiment classification. Evaluation of performance was measured with a confusion matrix, precision, recall, and F1 measure. Results: This study resulted in the extraction of a total of 629 tweet texts, of which, 40% of tweets exhibited neutral sentiments, followed by 35% of tweets showed negative sentiments and only 25% of tweets expressed positive sentiments towards social distancing. The SVM algorithm was applied by dissecting the dataset into 80% training and 20% testing data. Performance evaluation resulted in an accuracy of 71%. Upon using tweet texts with only positive and negative sentiment polarity, the accuracy increased to 81%. It was observed that reducing test data by 10% increased the accuracy to 87%. Conclusion: Results showed that an increase in training data increased the performance of the algorithm.

show abstract

Section: Methodsmentioning

confidence: 99%

Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter Data

Shofiya

Abidi

2021

IJERPH

View full text Add to dashboard Cite

show abstract

“…This work can lead the way for data scientists and frontend engineers to develop up-to-date data modeling and prediction software using Python web frameworks such as Django, 15 Flask, 16 and GUI library such as Tkinter. 17 Also, before our work, one of the largest textual data on Covid-19 was developed by both Google 18 and Johns Hopkins University Centre for Systems Science and Engineering 19 individually. Currently, we can also contribute our sophisticated Twitter data consisting of 600 k tweets with indexed parameters.…”

Section: Discussionmentioning

confidence: 99%

“…They filtered out public sentiment reflection on Covid-19 tweets, as well as visualizing them for a better understanding of the subject matter. Lamsal reported Covid-19 tweet dataset design [ 17 ]. In detail design, the authors have shown people’s understanding during the crisis of the present corona pandemic by temporal and spatial dimension of the dataset.…”

Section: Introductionmentioning

confidence: 99%

Predicting the pandemic: sentiment evaluation and predictive analysis from large-scale tweets on Covid-19 by deep convolutional neural network

Das

Kolya²

2021

Evol. Intel.

View full text Add to dashboard Cite

Engaging deep neural networks for textual sentiment analysis is an extensively practiced domain of research. Textual sentiment classification harnesses the full computational potential of deep learning models. Typically, these research works are carried either with a popular open-source data corpus, or self-extracted short phrase texts from Twitter, Reddit, or web-scrapped text data from other resources. Rarely do we see a large amount of data on a current ongoing event is being collected and cultured further. Also, an even more complex task would be to model the data from a currently ongoing event, not only for scaling the sentiment accuracy but also for making a predictive analysis for the same. In this paper, we propose a novel approach for achieving sentiment evaluation accuracy by using a deep neural network on live-streamed tweets on Coronavirus and future case growth prediction. We develop a large tweet corpus exclusively based on the Coronavirus tweets. We split the data into train and test sets, alongside we perform polarity classification and trend analysis. The refined outcome from the trend analysis helps to train the data to provide an incremental learning curvature for our neural network, and we obtain an accuracy of 90.67%. Finally, we provide a statistical-based future prediction for Coronavirus cases growth. Not only our model outperforms several previous state-of-art experiments in overall sentiment accuracy comparison for similar tasks, but it also maintains a throughout performance stability among all the test cases when tested with several popular open-source text corpora.

show abstract

“…COVID-19 twitter datasets were collected from the IEEE Data portal that originated from the LSTM model, developed by Rabindra Lamsal, which monitors the real-time twitter feed for COVID-19-related tweets [11]. It generates over 0.3 million requests every 24 hours and its time-series graph is updated every 30 seconds.…”

Section: Methodsmentioning

confidence: 99%

TClustVID: A Novel Machine Learning Classification Model to Investigate Topics and Sentiment in COVID-19 Tweets

Satu¹,

Khan

Mahmud

et al. 2020

Preprint

View full text Add to dashboard Cite

COVID-19, caused by the SARS-Cov2, varies greatly in its severity but represent serious respiratory symptoms with vascular and other complications, particularly in older adults. The disease can be spread by both symptomatic and asymptomatic infected individuals, and remains uncertainty over key aspects of its infectivity, no effective remedy yet exists and this disease causes severe economic effects globally. For these reasons, COVID-19 is the subject of intense and widespread discussion on social media platforms including Facebook and Twitter. These public forums substantially impact on public opinions in some cases and exacerbate widespread panic and misinformation spread during the crisis. Thus, this work aimed to design an intelligent clustering-based classification and topics extracting model (named TClustVID) that analyze COVID-19-related public tweets to extract significant sentiments with high accuracy. We gathered COVID-19 Twitter datasets from the IEEE Dataport repository and employed a range of data preprocessing methods to clean the raw data, then applied tokenization and produced a word-to-index dictionary. Thereafter, different classifications were employed to Twitter datasets which enabled exploration of the performance of traditional and TClustVID classification methods. TClustVID showed higher performance compared to the traditional classifiers determined by clustering criteria. Finally, we extracted significant topic clusters from TClustVID, split them into positive, neutral and negative clusters and implemented latent dirichlet allocation for extraction of popular COVID-19 topics. This approach identified common prevailing public opinions and concerns related to COVID-19, as well as attitudes to infection prevention strategies held by people from different countries concerning the current pandemic situation.

show abstract

Design and analysis of a large-scale COVID-19 tweets dataset

Cited by 152 publications

References 45 publications

Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter Data

Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter Data

Predicting the pandemic: sentiment evaluation and predictive analysis from large-scale tweets on Covid-19 by deep convolutional neural network

TClustVID: A Novel Machine Learning Classification Model to Investigate Topics and Sentiment in COVID-19 Tweets

Contact Info

Product

Resources

About