Proceedings of the 23rd International Conference on World Wide Web 2014
DOI: 10.1145/2567948.2579000
|View full text |Cite
|
Sign up to set email alerts
|

Predicting webpage credibility using linguistic features

Abstract: The article focuses on predicting trustworthiness from textual content of webpages. The recent work Olteanu et al. proposes a number of features (linguistic and social) to apply machine learning methods to recognize trust levels. We demonstrate that this approach can be substantially improved in two ways: by applying machine learning methods to vectors computed using psychosocial and psycholinguistic features and in a high-dimensional bag-of-words paradigm of word occurrences. Following [13], we test the metho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 34 publications
(30 citation statements)
references
References 17 publications
(34 reference statements)
0
30
0
Order By: Relevance
“…In the following, we list some relevant issues encountered while performing our experiments: Experimental results: this gap is also observed w.r.t. results reported by (Olteanu et al, 2013), which is acknowledged by (Wawer et al, 2014), despite numerous attempts to replicate experiments. Authors (Wawer et al, 2014) believe this is due to the lack of parameters and hyperparameters explicitly cited in the previous research (Olteanu et al, 2013).…”
Section: Discussionmentioning
confidence: 92%
See 1 more Smart Citation
“…In the following, we list some relevant issues encountered while performing our experiments: Experimental results: this gap is also observed w.r.t. results reported by (Olteanu et al, 2013), which is acknowledged by (Wawer et al, 2014), despite numerous attempts to replicate experiments. Authors (Wawer et al, 2014) believe this is due to the lack of parameters and hyperparameters explicitly cited in the previous research (Olteanu et al, 2013).…”
Section: Discussionmentioning
confidence: 92%
“…Recent research use the website label (Likert scale) released in the Microsoft dataset as a gold standard to train automated web credibility models, as follows: Olteanu et al (2013) proposes a number of properties (37 linguistic and textual features) and applies machine learning methods to recognize trust levels, obtaining 22 relevant features for the task. Wawer et al (2014) improve this work using psychosocial and psycholinguistic features (through The General Inquirer (GI) Lexical Database (Stone and Hunt, 1963)) achieving state of the art results.…”
Section: Automated Web Credibilitymentioning
confidence: 99%
“…In order to reveal linguistic differences between true and false claims, lexical and syntactic features at character, word, sentence and document level have been exploited [1,11,33,36]. Wawer et al [43] compute psycholinguistic features using a bag-of-words paradigm. Rashkin et al [34] compare the language of true claims with that of satire, hoaxes, and propaganda to find linguistic characteristics of untrustworthy text.…”
Section: Textual Contentmentioning
confidence: 99%
“…In recent years, certain criteria or 'dimensions' are widely accepted as the most important considerations for user judgements of relevance. These include reliability [25,34,43,47], understandability [26,51], novelty [7,49], effort [20,40,48], etc. A multidimensional relevance model was proposed [46,50] which defined five such dimensions and was extended to seven dimensions in [22], including 'interest' and 'habit' dimensions.…”
Section: Background 21 Multidimensional Relevancementioning
confidence: 99%