Ehsan Doostmohammadi scite author profile

Ehsan Doostmohammadi

5Publications

11Citation Statements Received

78Citation Statements Given

How they've been cited

How they cite others

110

Affiliations

Sharif University of Technology

Publications

Order By: Most citations

Joint Persian Word Segmentation Correction and Zero-Width Non-Joiner Recognition Using BERT

Doostmohammadi

Nassajian

Rahimi³

2020

View full text Add to dashboard Cite

Words are properly segmented in the Persian writing system; in practice, however, these writing rules are often neglected, resulting in single words being written disjointedly and multiple words written without any white spaces between them. This paper addresses the problems of word segmentation and zero-width non-joiner (ZWNJ) recognition in Persian, which we approach jointly as a sequence labeling problem. We achieved a macro-averaged F 1 -score of 92.40% on a carefully collected corpus of 500 sentences with a high level of difficulty.

show abstract

PerKey: A Persian News Corpus for Keyphrase Extraction and Generation

Doostmohammadi

Bokaei

Sameti

2018

View full text Add to dashboard Cite

Keyphrases provide an extremely dense summary of a text. Such information can be used in many Natural Language Processing tasks, such as information retrieval and text summarization. Since previous studies on Persian keyword or keyphrase extraction have not published their data, the field suffers from the lack of a human extracted keyphrase dataset. In this paper, we introduce PerKey 1 , a corpus of 553k news articles from six Persian news websites and agencies with relatively high quality author extracted keyphrases, which is then filtered and cleaned to achieve higher quality keyphrases. The resulted data was put into human assessment to ensure the quality of the keyphrases. We also measured the performance of different supervised and unsupervised techniques, e.g. TFIDF, MultipartiteRank, KEA, etc. on the dataset using precision, recall, and F 1-score.

show abstract

Ghmerti at SemEval-2019 Task 6: A Deep Word- and Character-based Approach to Offensive Language Identification

Doostmohammadi¹,

Sameti²,

Saffar³

2019

View full text Add to dashboard Cite

This paper presents the models submitted by Ghmerti team for subtasks A and B of the Of-fensEval shared task at SemEval 2019. Offen-sEval addresses the problem of identifying and categorizing offensive language in social media in three subtasks; whether or not a content is offensive (subtask A), whether it is targeted (subtask B) towards an individual, a group, or other entities (subtask C). The proposed approach includes character-level Convolutional Neural Network, word-level Recurrent Neural Network, and some preprocessing. The performance achieved by the proposed model for subtask A is 77.93% macro-averaged F 1-score.

show abstract

Persian Word Embedding Evaluation Benchmarks

Zahedi

Bokaei

Shoeleh

et al. 2018

View full text Add to dashboard Cite

Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging

Doostmohammadi

Nassajian

Rahimi³

2020

View full text Add to dashboard Cite

Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformerbased methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F 1 -score more than the previous stateof-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.