2019
DOI: 10.1007/s41701-019-00065-w
|View full text |Cite
|
Sign up to set email alerts
|

The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments

Abstract: We present the SFU Opinion and Comments Corpus (SOCC), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper The Globe and Mail in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, res… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
26
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
4
2

Relationship

1
9

Authors

Journals

citations
Cited by 51 publications
(29 citation statements)
references
References 58 publications
(75 reference statements)
2
26
0
Order By: Relevance
“…(The differences are all statistically significant when the test set is Waseem and Hovy.) Similar trends were observed by Karan andŠnajder (2018) when employing the Kolhatkar et al (2018) and TRAC-1 (Kumar et al, 2018a) datasets, that have 62.7% and 56.6% positive samples, respectively, and exhibited better results in cross-dataset testing than datasets with lower positive sample ratios. 6 Synthesising Subtasks Using the Hierarchical Model…”
Section: Cross-dataset Training and Testingsupporting
confidence: 81%
“…(The differences are all statistically significant when the test set is Waseem and Hovy.) Similar trends were observed by Karan andŠnajder (2018) when employing the Kolhatkar et al (2018) and TRAC-1 (Kumar et al, 2018a) datasets, that have 62.7% and 56.6% positive samples, respectively, and exhibited better results in cross-dataset testing than datasets with lower positive sample ratios. 6 Synthesising Subtasks Using the Hierarchical Model…”
Section: Cross-dataset Training and Testingsupporting
confidence: 81%
“…Thus, a direct comparison between different types of opinionated discourse – the more formal opinion articles versus the comparatively more personal and informal reader comments – is facilitated. The data spans the time period from 2012 to 2016 and was collected from the Canadian online newspaper The Globe and Mail (Kolhatkar et al, 2020). SOCC has been specifically designed to analyse the characteristics of online comments preserving the comments’ thread-structure and providing extensive discourse annotation (e.g.…”
Section: Databasementioning
confidence: 99%
“…A bewildering plethora of different types of abusive language can be found online. Some of the types dealt with in related work include but are not limited to sexism, racism (Waseem and Hovy, 2016;Waseem, 2016), toxicity (Kolhatkar et al, 2018), hatefulness (Gao and Huang, 2017), aggression (Kumar et al, 2018), attack (Wulczyn et al, 2017), obscenity, threats, and insults. A typology of abusive language detection subtasks was recently proposed by Waseem et al (2017).…”
Section: Related Workmentioning
confidence: 99%