Requiem for online harassers: Identifying racism from political tweets

Lozano, Estefania; Cedeno, Jorge; Castillo, Galo; Layedra, Fabricio; Lasso, Henry; Vaca, Carmen

doi:10.1109/icedeg.2017.7962526

Cited by 9 publications

(13 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, a recent work studied the diffusion of profanity in Sina Weibo, one of the largest Chinese social media platforms . Research on abusive and hate speech detection (a close related research area to profane language detection) has focused on developing automatic techniques to identify racists and sexist on Twitter (Badjatiya et al, 2017;Lozano et al, 2017), Reddit (Chandrasekharan et al, 2017;Mohan et al, 2017), and Youtube (Obadimu et al, 2019). However, few studies have focused on detecting profane language in video stream services such as Netflix, Hulu, and Prime Video.…”

Section: Related Workmentioning

confidence: 99%

Proceedings of The 4th Workshop on e-Commerce and NLP

2021

View full text Add to dashboard Cite

Word embeddings (e.g., word2vec) have been applied successfully to eCommerce products through prod2vec. Inspired by the recent performance improvements on several NLP tasks brought by contextualized embeddings, we propose to transfer BERT-like architectures to eCommerce: our model -Prod2BERT -is trained to generate representations of products through masked session modeling. Through extensive experiments over multiple shops, different tasks, and a range of design choices, we systematically compare the accuracy of Prod2BERT and prod2vec embeddings: while Prod2BERT is found to be superior in several scenarios, we highlight the importance of resources and hyperparameters in the best performing models. Finally, we provide guidelines to practitioners for training embeddings under a variety of computational and data constraints. * Federico and Bingqing contributed equally to this research. † Corresponding author. 10 Costs are from official AWS pricing, with 0.10 USD/h for the c4.large (https://aws.amazon.com/ it/ec2/pricing/on-demand/), and 12,24 USD/h for the p3.8xlarge (https://aws.amazon.com/it/ec2/ instance-types/p3/). While obviously cost optimizations are possible, the "naive" pricing is a good proxy to appreciate the difference between the two methods. Ethical ConsiderationsUser data has been collected by Coveo in the process of providing business services: data is collected and processed in an anonymized fashion, in compliance with existing legislation. In particular, the target dataset uses only anonymous uuids to label events and, as such, it does not contain any information that can be linked to physical entities. ReferencesSamar Al-Saqqa and Arafat Awajan. 2019. The use of word2vec model in sentiment analysis: A survey. In Proceedings of the 2019 International Conference on Artificial Intelligence, Robotics and Control, pages 39-43.

show abstract

Section: Related Workmentioning

confidence: 99%

Proceedings of The 4th Workshop on e-Commerce and NLP

2021

View full text Add to dashboard Cite

show abstract

“…For example, a recent work studied the diffusion of profanity in Sina Weibo, one of the largest Chinese social media platforms (Song et al, 2020). Research on abusive and hate speech detection (a close related research area to profane language detection) has focused on developing automatic techniques to identify racists and sexist on Twitter (Badjatiya et al, 2017;Lozano et al, 2017), Reddit (Chandrasekharan et al, 2017Mohan et al, 2017), and Youtube (Obadimu et al, 2019). However, few studies have focused on detecting profane language in video stream services such as Netflix, Hulu, and Prime Video.…”

Section: Related Workmentioning

confidence: 99%

“…Previous research has focused on developing automated techniques to detect profane language in user generated contents on social media. For example, there have been growing interests in detecting hate speech and racism on Twitter (Xiang et al, 2012;Badjatiya et al, 2017;Lozano et al, 2017). Some recent works have also studied offensive contents in Youtube (Alcântara et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

Detect Profane Language in Streaming Services to Protect Young Audiences

Chen¹,

Wei²,

Xiang³

2021

Proceedings of the 4th Workshop on E-Commerce and NLP

View full text Add to dashboard Cite

With the rapid growth of online video streaming, recent years have seen increasing concerns about profane language in their content. Detecting profane language in streaming services is challenging due to the long sentences appeared in a video. While recent research on handling long sentences has focused on developing deep learning modeling techniques, little work has focused on techniques on improving data pipelines. In this work, we develop a data collection pipeline to address long sequence of texts and integrate this pipeline with a multi-head self-attention model. With this pipeline, our experiments show the selfattention model offers 12.5% relative accuracy improvement over state-of-the-art distilBERT model on profane language detection while requiring only 3% of parameters. This research designs a better system for informing users of profane language in video streaming services.

show abstract

“…There is a growing body of study being undertaken on hate speech, including automated methods for detecting hate speech [14,13,15] and other related topics such as offensive language identification [16,17], cyberbullying [18,19], radicalization, and Terrorism [20,21]. The studies on hate speech have handled the automatic classification problem in one of two ways: as a binary classification work or as a multi-class classification task.…”

Section: Literature Reviewmentioning

confidence: 99%

Psychosocial Features for Identifying Hate Speech in Social Media Text

Ombui

Muchemi

Wagacha

2021

JESBS

View full text Add to dashboard Cite

This study uses natural language processing to identify hate speech in social media codeswitched text. It trains nine models and tests their predictiveness in recognizing hate speech in a 50k human-annotated dataset. The article proposes a novel hierarchical approach that leverages Latent Dirichlet Analysis to develop topic models that assist build a high-level Psychosocial feature set we call PDC. PDC organizes words into word families, which helps capture codeswitching during preprocessing for supervised learning models. Informed by the duplex theory of hate, the PDC features are based on a hate speech annotation framework. Frequency-based models employing the PDC feature on tweets from the 2012 and 2017 Kenyan presidential elections yielded an f-score of 83 percent (precision: 81 percent, recall: 85 percent) in recognizing hate speech. The study is notable because it publicly exposes a rich codeswitched dataset for comparative studies. Second, it describes how to create a novel PDC feature set to detect subtle types of hate speech hidden in codeswitched data that previous approaches could not detect.

show abstract

Requiem for online harassers: Identifying racism from political tweets

Cited by 9 publications

References 10 publications

Proceedings of The 4th Workshop on e-Commerce and NLP

Proceedings of The 4th Workshop on e-Commerce and NLP

Detect Profane Language in Streaming Services to Protect Young Audiences

Psychosocial Features for Identifying Hate Speech in Social Media Text

Contact Info

Product

Resources

About