Proceedings of the 25th ACM International on Conference on Information and Knowledge Management 2016
DOI: 10.1145/2983323.2983808
|View full text |Cite
|
Sign up to set email alerts
|

Finding News Citations for Wikipedia

Abstract: An important editing policy in Wikipedia is to provide citations for added statements in Wikipedia pages, where statements can be arbitrary pieces of text, ranging from a sentence to a paragraph. In many cases citations are either outdated or missing altogether. In this work we address the problem of finding and updating news citations for statements in entity pages. We propose a two-stage supervised approach for this problem. In the first step, we construct a classifier to find out whether statements need a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
28
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 29 publications
(28 citation statements)
references
References 22 publications
0
28
0
Order By: Relevance
“…Automated or semi-automated tools [17,39,44] can help improve user experience [29,69], content variety [42,67], and quality [1,20,28]. The reliability of Wikipedia can also be improved automatically, e.g., by finding potential citations [15] and Wikipedia statements in need of evidence [48]. The insights from our work can help improve Wikipedia via new citations with which users would be more likely to interact.…”
Section: Science In Wikipediamentioning
confidence: 99%
“…Automated or semi-automated tools [17,39,44] can help improve user experience [29,69], content variety [42,67], and quality [1,20,28]. The reliability of Wikipedia can also be improved automatically, e.g., by finding potential citations [15] and Wikipedia statements in need of evidence [48]. The insights from our work can help improve Wikipedia via new citations with which users would be more likely to interact.…”
Section: Science In Wikipediamentioning
confidence: 99%
“…Our work explores similar problems in the different domain of Wikipedia articles: while scholarly literature cites work for different purposes [1] to support original research, the aim of Wikipedia's citations is to verify existing knowledge. Previous work on the task of source recommendation in Wikipedia has focused on cases where statements are marked with a citation needed tag [14][15][16]44]. Sauper et al [14,44] focused on adding missing information in Wikipedia articles from external sources like news, where the corresponding Wikipedia entity is a salient concept.…”
Section: Related Workmentioning
confidence: 99%
“…Sauper et al [14,44] focused on adding missing information in Wikipedia articles from external sources like news, where the corresponding Wikipedia entity is a salient concept. In another study [16], Fetahu et al used existing statements that have either an outdated citation or citation needed tag to query for relevant citations in a news corpus. Finally, the authors in [15], attempted to determine the citation span-that is, which parts of the paragraph are covered by the citation-for any given existing citation in a Wikipedia article and the corresponding paragraph in which it is cited.…”
Section: Related Workmentioning
confidence: 99%
“…DeFacto [16] uses machine learning and NLP to produce scores about the likelihood of a web page to contain specific pieces of information and about its trustworthiness. Fetahu et al also apply machine learning to assess web pages and find sources that are authoritative and relevant for statements within Wikipedia articles [8]. Other methods focus on evaluating provenance through similarity and distance metrics computed across different databases [5].…”
Section: Evaluating Provenancementioning
confidence: 99%
“…These models did not apply to Wikidata as this quantitative approach differs from the focus on principles such as type, author, and publisher that Wikidata follows. Furthermore, Wikidata external sources have diverse formats including web pages, PDFs, or csv files, which may be problematic to evaluate for completely automated systems such as [8] or [16]. DeFacto's measure of trustworthiness would need extensive testing in order to understand how it matches the definition of authoritativeness used by Wikidata.…”
Section: Evaluating Provenancementioning
confidence: 99%