Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval 2012
DOI: 10.1145/2348283.2348413
|View full text |Cite
|
Sign up to set email alerts
|

Predicting quality flaws in user-generated content

Abstract: The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improve… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
86
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
4
2
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 65 publications
(86 citation statements)
references
References 39 publications
0
86
0
Order By: Relevance
“…The parameters of classifier cj are optimized on the respective training set Cj ⊆ Dtrain ; clusters with less than five elements are discarded. Figure 1 illustrates the results on four different datasets: artificially created objects with three clusters (plot in Figure 2), documents from the 20 Newsgroups dataset with category "computer" in the role of the target class, books from different authors for which the authorship is to be verified [4], and Wikipedia articles tagged with certain quality flaws that are to be detected [1]. All documents are represented under a vector space model with a tf-idf weighting except for Wikipedia articles where quality-specific features [1] are employed.…”
Section: Analysis and Resultsmentioning
confidence: 99%
“…The parameters of classifier cj are optimized on the respective training set Cj ⊆ Dtrain ; clusters with less than five elements are discarded. Figure 1 illustrates the results on four different datasets: artificially created objects with three clusters (plot in Figure 2), documents from the 20 Newsgroups dataset with category "computer" in the role of the target class, books from different authors for which the authorship is to be verified [4], and Wikipedia articles tagged with certain quality flaws that are to be detected [1]. All documents are represented under a vector space model with a tf-idf weighting except for Wikipedia articles where quality-specific features [1] are employed.…”
Section: Analysis and Resultsmentioning
confidence: 99%
“…Before transferring values of particular parameters of infobox, information are compared to other language versions, but versions with higher quality and popularity scores will have higher influence (weight) on selecting the proper value. The methods proposed in the paper are used in WikiRank.net service 10 , which assesses and compare articles in the various language versions of Wikipedia.…”
Section: Discussionmentioning
confidence: 99%
“…Basic lexical metrics based on word usages in Wikipedia articles used in another study as the factors that can reflect articles quality -high-quality articles often used more nouns and verbs and less adjectives [9]. Finally, quality evaluation of Wikipedia articles can also base on special quality flaw templates [10].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…It denotes the task of automatically detecting flaws according to Wikipedia's guidelines, something not to neglect when working with Wikipedia. Anderka et al [8] have done an impressive work in this field and give a nice overview of the first challenge dedicated to this topic [9]. Another related topic is the research on Wikipedia's revision history and talk pages.…”
Section: Research On Wikipediamentioning
confidence: 99%