Mihai Manolescu scite author profile

Mihai Manolescu

3Publications

6Citation Statements Received

29Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

ROFF - A Romanian Twitter Dataset for Offensive Language

Manolescu¹,

Çöltekin²

2021

View full text Add to dashboard Cite

This paper describes the annotation process of an offensive language data set for Romanian on social media. To facilitate comparable multi-lingual research on offensive language, the annotation guidelines follow some of the recent annotation efforts for other languages. The final corpus contains 5000 micro-blogging posts annotated by a large number of volunteer annotators. The inter-annotator agreement and the initial automatic discrimination results we present are in line with earlier annotation efforts.

show abstract

TuEval at SemEval-2019 Task 5: LSTM Approach to Hate Speech Detection in English and Spanish

Manolescu¹,

Löfflad²,

Saber³

et al. 2019

View full text Add to dashboard Cite

The detection of hate speech, especially in online platforms and forums, is quickly becoming a hot topic as anti-hate speech legislation begins to be applied to public discourse online. The HatEval shared task was created with this in mind; participants were expected to develop a model capable of determining whether or not input (in this case, Twitter posts in English and Spanish) could be considered hate speech (designated as Subtask A), if they were aggressive, and whether the tweet was targeting an individual, or speaking generally (Subtask B). We approached this Subtask by creating a LSTM model with an embedding layer. We found that our model performed considerably better on English language input when compared to Spanish language input. In English, we achieved an F1-Score of 0.466 for Subtask A and 0.462 for Subtask B; In Spanish, we achieved scores of 0.617 and 0.612 on Subtask A and Subtask B, respectively.

show abstract

TueMix at SemEval-2020 Task 9: Logistic Regression with Linguistic Feature Set

Bear¹,

Hoefels²,

Manolescu³

2020

View full text Add to dashboard Cite

Commonly occurring in settings such as social media platforms, code-mixed content makes the task of identifying sentiment notably more challenging and complex due to the lack of structure and noise present in the data. SemEval-2020 Task 9, SentiMix, was organized with the purpose of detecting the sentiment of a given code-mixed tweet comprising Hindi and English. We tackled this task by comparing the performance of a system, TueMix -a logistic regression algorithm trained with three feature components: TF-IDF n-grams, monolingual sentiment lexicons, and surface features -with a neural network approach. Our results showed that TueMix outperformed the neural network systems and the addition of the linguistic features beyond TF-IDF n-grams enhanced our performance, yielding a weighted F1-score of 0.685.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Mihai Manolescu

ROFF - A Romanian Twitter Dataset for Offensive Language

TuEval at SemEval-2019 Task 5: LSTM Approach to Hate Speech Detection in English and Spanish

TueMix at SemEval-2020 Task 9: Logistic Regression with Linguistic Feature Set

Contact Info

Product

Resources

About