2019
DOI: 10.1177/2053951719843310
|View full text |Cite
|
Sign up to set email alerts
|

Big Data and quality data for fake news and misinformation detection

Abstract: Fake news has become an important topic of research in a variety of disciplines including linguistics and computer science. In this paper, we explain how the problem is approached from the perspective of natural language processing, with the goal of building a system to automatically detect misinformation in news. The main challenge in this line of research is collecting quality data, i.e., instances of fake and real news articles on a balanced distribution of topics. We review available datasets and introduce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
45
0
2

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 83 publications
(59 citation statements)
references
References 37 publications
1
45
0
2
Order By: Relevance
“…Algorithms could be analysed through laboratory testing and reverse engineering ( Diakopoulos, 2015 ), which means reconstructing the algorithm to identify the functional principles. Additionally, more cooperation with independent researchers would be helpful in order to research the dissemination of misinformation ( Lazer et al, 2018 ; Torabi Asr & Taboada, 2019 ).…”
Section: Concepts Of Digital Media Ethics and Responsibilitymentioning
confidence: 99%
“…Algorithms could be analysed through laboratory testing and reverse engineering ( Diakopoulos, 2015 ), which means reconstructing the algorithm to identify the functional principles. Additionally, more cooperation with independent researchers would be helpful in order to research the dissemination of misinformation ( Lazer et al, 2018 ; Torabi Asr & Taboada, 2019 ).…”
Section: Concepts Of Digital Media Ethics and Responsibilitymentioning
confidence: 99%
“…The size of the dataset plays an important role in ensuring a high accuracy of the fake detection process. In particular, if the dataset is used to train a fake news detection method that is based on machine learning, it is fundamental to have a large dataset because the performance of this kind of method improves as the training dataset size increases ( Torabi & Taboada, 2019 ). The negative aspect is that very large datasets are less reliable using manual annotation due to time consumption and misclassification ( Ghiassi & Lee, 2018 ).…”
Section: Survey Methodologymentioning
confidence: 99%
“…A big dataset is fundamental for achieving a highly accurate fake detection process, mainly for fake news detection methods based on deep neural network models, which need a large dataset because their performance improves as the training dataset size increases. Torabi & Taboada (2019) discussed the necessity to use big data for fake news detection and encouraged researchers in this field to share their datasets and to work together towards a standardized large-scale fake news benchmark dataset.…”
Section: Survey Methodologymentioning
confidence: 99%
“…FakeNewsNet contains two comprehensive datasets with diverse features in news content, social context, and spatiotemporal information. Asr et al [33] reviewed the available misinformation detection datasets and introduced the "MisInfoText" repository to address the lack of datasets with reliable labels. MisInfoText repository contains three data categories: links to all the publicly available textual fake news datasets, features to collect data directly from fact-checking websites, and datasets originally published in [30] In summary, many existing works focus on building misleading-information detection systems.…”
Section: Related Workmentioning
confidence: 99%