Hamdy Mubarak scite author profile

In this paper, we present Farasa, a fast and accurate Arabic segmenter. Our approach is based on SVM-rank using linear kernels. We measure the performance of the segmenter in terms of accuracy and efficiency, in two NLP tasks, namely Machine Translation (MT) and Information Retrieval (IR). Farasa outperforms or is at par with the stateof-the-art Arabic segmenters (Stanford and MADAMIRA), while being more than one order of magnitude faster.

show abstract

SemEval-2017 Task 3: Community Question Answering

Nakov¹,

Hoogeveen²,

Màrquez³

et al. 2017

179

157

View full text Add to dashboard Cite

We describe SemEval2017 Task 3 on Community Question Answering.This year, we reran the four subtasks from SemEval-2016: (A) Question-Comment Similarity, (B) Question-Question Similarity, (C) QuestionExternal Comment Similarity, and (D) Rerank the correct answers for a new question in Arabic, providing all the data from 2015 and 2016 for training, and fresh data for testing. Additionally, we added a new subtask E in order to enable experimentation with Multi-domain Question Duplicate Detection in a larger-scale scenario, using StackExchange subforums. A total of 23 teams participated in the task, and submitted a total of 85 runs (36 primary and 49 contrastive) for subtasks A-D. Unfortunately, no teams participated in subtask E. A variety of approaches and features were used by the participating systems to address the different subtasks. The best systems achieved an official score (MAP) of 88. 43, 47.22, 15.46, and 61.16 in subtasks A, B, C, and D, respectively. These scores are better than the baselines, especially for subtasks A-C.

show abstract

Abusive Language Detection on Arabic Social Media

Mubarak¹,

Darwish²,

Magdy³

2017

224

147

View full text Add to dashboard Cite

In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean). We make this dataset freely available for research, in addition to the list of obscene words and hashtags. We are also publicly releasing a large corpus of classified user comments that were deleted from a popular Arabic news site due to violations the site's rules and guidelines.

show abstract

SemEval-2016 Task 3: Community Question Answering

Nakov¹,

Màrquez²,

Moschitti³

et al. 2016

153

View full text Add to dashboard Cite

show abstract

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

Zampieri

Nakov²,

Rosenthal

et al. 2020

258

View full text Add to dashboard Cite

We present the results and the main findings of SemEval-2020 Task 12 on Multilingual Offensive Language Identification in Social Media (OffensEval-2020). The task included three subtasks corresponding to the hierarchical taxonomy of the OLID schema from OffensEval-2019, and it was offered in five languages: Arabic, Danish, English, Greek, and Turkish. OffensEval-2020 was one of the most popular tasks at SemEval-2020, attracting a large number of participants across all subtasks and languages: a total of 528 teams signed up to participate in the task, 145 teams submitted official runs on the test data, and 70 teams submitted system description papers.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hamdy Mubarak

Farasa: A Fast and Furious Segmenter for Arabic

SemEval-2017 Task 3: Community Question Answering

Abusive Language Detection on Arabic Social Media

SemEval-2016 Task 3: Community Question Answering

SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)

Contact Info

Product

Resources

About