Fábio Bif Goularte scite author profile

Microblog posts such as tweets frequently contain users’ opinions and thoughts about events, products, people, institutions, etc. However, the usage of social media to prop-agate hate speech is not an uncommon occurrence. Analyzing hateful speech in social media is essential for understanding, fighting and discouraging such actions. We believe that by extracting fragments of text that are semantically similar it is possible to depict recurrent linguistic patterns in certain kinds of discourse. Therefore, we aim to use these patterns to encapsulate frequent statements textually expressed in microblog posts. In this paper, we propose to exploit such linguistic patterns in the context of hate speech. Through a technique that we call SSP (Short Semantic Pattern) mining, we are able to extract sequences of words that share a similar meaning in their word embedding representation. By analyzing the extracted patterns, we reveal some kinds of discourses that are replayed across a dataset, such as racist and sexist statements. Afterwards, we experiment using SSP as features to build classifiers that detect if a tweet contains hate speech (binary classification) and to distinguish between sexist, racist and clean tweets (ternary classification). The SSP instances encountered in tweets containing sexism have shown that a large number of sexist tweets began with the introduction ‘I’m not sexist but’ and ‘Call me sexist but’. Meanwhile, SSP instances found in tweets reproducing racism revealed a prominence of contents against the Islamic religion, associated entities and organizations.

show abstract

Análise de Métodos e Ferramentas para Reconhecimento de Palavras Relevantes em Microblogs

Sorato

Goularte

Nassar

et al. 2016

View full text Add to dashboard Cite

Extrair informações acuradas dos enormes volumes de dados, muitos dos quais não estruturados, gerados em mídias sociais é um grande desafio atualmente, mas com diversas aplicações relevantes, muitas delas ainda latentes. Um dos primeiros e mais decisivos passos deste processo de extração de informação é o reconhecimento de palavras relevantes em textos. Este artigo apresenta um estudo comparativo de métodos e ferramentas para reconhecer palavras relevantes em postagens de microblogs. Dentre diversas ferramentas analisadas, cinco delas foram selecionadas para experimentos com 100 mil tweets. Tais experimentos mostraram alta variabilidade dos resultados de ferramentas distintas, o que sugere a necessidade de melhorias.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Fábio Bif Goularte

A text summarization method based on fuzzy rules and applicable to automated assessment

MSC+: Language pattern learning for word sense induction and disambiguation

Short Semantic Patterns: A Linguistic Pattern Mining Approach for Content Analysis Applied to Hate Speech

Análise de Métodos e Ferramentas para Reconhecimento de Palavras Relevantes em Microblogs

Contact Info

Product

Resources

About