Assessing the quality of information on wikipedia: A deep‐learning approach

Wang, Ping; Li, Xiaodan

doi:10.1002/asi.24210

Cited by 22 publications

(14 citation statements)

References 54 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Contrary to what we expect, the CNN performs the worst. In most cases, the CNN has high performance in learning relevant features and ruling out irrelevant features [54,62]. Moreover, after comparison of basic LSTM and CNN-LSTM, we find that the CNN degrades the model performance.…”

Section: Methodsmentioning

confidence: 97%

“…Few studies have analysed and summarised the existing work. In this section, we perform an extensive review of the existing feature frameworks [1,2,5,6,12,23–27,35,42,44,47–54] and propose a comprehensive feature framework as a representation of Wikipedia articles. Text statistics are indicators that measure basic article statistics [1,23], including word count and character count.…”

Section: Representation Of Wikipedia Articlesmentioning

confidence: 99%

“…Usually, more editing will make Wikipedia articles more understandable and readable. Network features are based on a Wikipedia link graph [1,23,54]. Network features focus on relationships among different Wikipedia articles.…”

Section: Representation Of Wikipedia Articlesmentioning

confidence: 99%

See 2 more Smart Citations

A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia

Wang

2019

Journal of Information Science

Self Cite

View full text Add to dashboard Cite

Wikipedia is becoming increasingly critical in helping people obtain information and knowledge. Its leading advantage is that users can not only access information but also modify it. However, this presents a challenging issue: how can we measure the quality of a Wikipedia article? The existing approaches assess Wikipedia quality by statistical models or traditional machine learning algorithms. However, their performance is not satisfactory. Moreover, most existing models fail to extract complete information from articles, which degrades the model’s performance. In this article, we first survey related works and summarise a comprehensive feature framework. Then, state-of-the-art deep learning models are introduced and applied to assess Wikipedia quality. Finally, a comparison among deep learning models and traditional machine learning models is conducted to validate the effectiveness of the proposed model. The models are compared extensively in terms of their training and classification performance. Moreover, the importance of each feature and the importance of different feature sets are analysed separately.

show abstract

Section: Methodsmentioning

confidence: 97%

Section: Representation Of Wikipedia Articlesmentioning

confidence: 99%

See 1 more Smart Citation

A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia

Wang

2019

Journal of Information Science

Self Cite

View full text Add to dashboard Cite

show abstract

“…Importantly, such mapping should consider disciplinary differences in citations from Wikipedia, as well as books (5.3 million citations by our estimates) and nonscientific sources such as news outlets and other online media (21.5 million citations), which make up the largest share of Wikipedia citations. Answering these questions is critical to inform the community work on improving Wikipedia by finding and filling knowledge gaps and biases, at the same time guaranteeing the quality and diversity of the sources Wikipedia relies upon (Hube, 2017;Mesgari et al, 2015;Piscopo, Kaffee et al, 2017;Piscopo & Simperl, 2019;Wang & Li, 2020). Link prediction in general, and citation recommendation in particular, have been explored for Wikipedia for some time (Fetahu et al, 2016;Paranjape, West et al, 2016;Wulczyn, West et al, 2016).…”

Section: Map Of Wikipedia Sourcesmentioning

confidence: 99%

Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia

Singh¹,

West²,

Colavizza

2021

Quantitative Science Studies

View full text Add to dashboard Cite

Wikipedia’s content is based on reliable and published sources. To this date, relatively little is known about what sources Wikipedia relies on, in part because extracting citations and identifying cited sources is challenging. To close this gap, we release Wikipedia Citations, a comprehensive data set of citations extracted from Wikipedia. A total of 29.3 million citations were extracted from 6.1 million English Wikipedia articles as of May 2020, and classified as being books, journal articles, or Web content. We were thus able to extract 4.0 million citations to scholarly publications with known identifiers—including DOI, PMC, PMID, and ISBN—and further equip an extra 261 thousand citations with DOIs from Crossref. As a result, we find that 6.7% of Wikipedia articles cite at least one journal article with an associated DOI, and that Wikipedia cites just 2% of all articles with a DOI currently indexed in the Web of Science. We release our code to allow the community to extend upon our work and update the data set in the future.

show abstract

“…To solve that problem, machine learning algorithms have been applied to Wikipedia article quality assessment. They combined machine learning models with hand‐crafted features to assess the quality of Wikipedia articles (Shen et al, 2017; Zhang et al, 2018; Ferschke et al, 2012; Khairova et al, 2017; Wang & Li, 2020; Wang et al, 2019). However, semantic features from the article content are often ignored in these approaches.…”

Section: Related Workmentioning

confidence: 99%

Measuring Quality of Wikipedia Articles by Feature Fusion‐based Stack Learning

Hou

Jiang-nan

Wang

2021

Proceedings of the Association for Information Science and Tech

Self Cite

View full text Add to dashboard Cite

Online open-source knowledge repository such as Wikipedia has become an increasingly important source for users to access knowledge. However, due to its large volume, it is challenging to evaluate Wikipedia article quality manually. To fill this gap, we propose a novel approach named "feature fusion-based stack learning" to assess the quality of Wikipedia articles. Pre-trained language models including BERT (Bidirectional Encoder Representations from Transformers) and ELMo (Embeddings from Language Models) are applied to extract semantic information in Wikipedia content. The feature fusion framework consisting of semantic and statistical features is built and fed into an out-of-sample (OOS) stacking model, which includes both machine learning and deep learning models. We compare the performance of proposed model with some existing models with different metrics extensively, and conduct ablation studies to prove the effectiveness of our framework and OOS stacking. Generally, the experiment shows that our method is much better than state-of-the-art models.

show abstract

Assessing the quality of information on wikipedia: A deep‐learning approach

Cited by 22 publications

References 54 publications

A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia

A deep learning-based quality assessment model of collaboratively edited documents: A case study of Wikipedia

Wikipedia citations: A comprehensive data set of citations with identifiers extracted from English Wikipedia

Measuring Quality of Wikipedia Articles by Feature Fusion‐based Stack Learning

Contact Info

Product

Resources

About