Proceedings of the 11th Forum for Information Retrieval Evaluation 2019
DOI: 10.1145/3368567.3368584
|View full text |Cite
|
Sign up to set email alerts
|

Overview of the HASOC track at FIRE 2019

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 160 publications
(72 citation statements)
references
References 3 publications
0
67
0
Order By: Relevance
“…A second key distinction concerns the source from which data are retrieved. The microblogging platform Twitter 11 is by far the most exploited source, due to the relatively reduced length of texts and to a friendly policy on making data publicly available: 32 resources contain tweets, one of which (Olteanu et al 2018) also features posts from the social aggregator Reddit 12 , one (Nascimento et al 2019) also retrieves comments from the 55chan 13 imageboard, while in two works (Bosco et al 2018;Mandl et al 2019 2018use sentences from the well-known white-suprematist forum Stormfront; the dataset released for the Hate Speech Hackathon 15 contains posts from the Wikipedia Topical focus: Abusiveness (5); Aggressiveness (2); Anti-Roma (1); Child sexual abuse (1); Cyberbullying (2); Flames (1); Harassment (1); Homophobia (4); HS (36); Islamophobia (2); Obscenity, Profanity (3); Offensiveness (13); Personal Attacks (1); Racism (6); Sexism, Misogyny (9); Threats, Violence (1); Toxicity (1); White supremacy (1). Nearly all the resources feature user-generated public contents, mostly microblog posts, often retrieved with a keyword-based approach and mostly using words with a negative polarity.…”
Section: Data Sourcementioning
confidence: 99%
See 1 more Smart Citation
“…A second key distinction concerns the source from which data are retrieved. The microblogging platform Twitter 11 is by far the most exploited source, due to the relatively reduced length of texts and to a friendly policy on making data publicly available: 32 resources contain tweets, one of which (Olteanu et al 2018) also features posts from the social aggregator Reddit 12 , one (Nascimento et al 2019) also retrieves comments from the 55chan 13 imageboard, while in two works (Bosco et al 2018;Mandl et al 2019 2018use sentences from the well-known white-suprematist forum Stormfront; the dataset released for the Hate Speech Hackathon 15 contains posts from the Wikipedia Topical focus: Abusiveness (5); Aggressiveness (2); Anti-Roma (1); Child sexual abuse (1); Cyberbullying (2); Flames (1); Harassment (1); Homophobia (4); HS (36); Islamophobia (2); Obscenity, Profanity (3); Offensiveness (13); Personal Attacks (1); Racism (6); Sexism, Misogyny (9); Threats, Violence (1); Toxicity (1); White supremacy (1). Nearly all the resources feature user-generated public contents, mostly microblog posts, often retrieved with a keyword-based approach and mostly using words with a negative polarity.…”
Section: Data Sourcementioning
confidence: 99%
“…Finally, two competitions explicitly focused on the identification of both HS and offensive language, i.e. HASOC at FIRE 2019 (Mandl et al 2019) and HSD, the HS detection task on Vietnamese at VLSP campaign in 2019 (Vu et al 2019).…”
Section: Shared Tasksmentioning
confidence: 99%
“…As mentioned earlier, the six publicly available datasets [4][5][6][7][8][9] are manually curated and are of modest size. In the below text, we discuss the characteristics and the generation process of each of these datasets.…”
Section: Related Datasetsmentioning
confidence: 99%
“…The same process has been adopted for code-mixed and pure Hindi language. Since the sample space of [9] and [4] is small, we have tried to update these datasets with our own curated samples.…”
Section: Data Gathering Processmentioning
confidence: 99%
See 1 more Smart Citation