Diwa Koirala scite author profile

Diwa Koirala

2Publications

6Citation Statements Received

14Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Offensive Language Detection in Nepali Social Media

Niraula¹,

Dulal²,

Koirala³

2021

View full text Add to dashboard Cite

Social media texts such as blog posts, comments, and tweets often contain offensive languages including racial hate speech comments, personal attacks, and sexual harassments. Detecting inappropriate use of language is, therefore, of utmost importance for the safety of the users as well as for suppressing hateful conduct and aggression. Existing approaches to this problem are mostly available for resource-rich languages such as English and German. In this paper, we characterize the offensive language in Nepali, a low-resource language, highlighting the challenges that need to be addressed for processing Nepali social media text. We also present experiments for detecting offensive language using supervised machine learning. Besides contributing the first baseline approaches of detecting offensive language in Nepali, we also release human annotated data sets to encourage future research on this crucial topic.

show abstract

Linguistic Taboos and Euphemisms in Nepali

Niraula¹,

Dulal

Koirala³

2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

Languages across the world have words, phrases, and behaviors - the taboos - that are avoided in public communication considering them as obscene or disturbing to the social, religious, and ethical values of society. However, people deliberately use these linguistic taboos and other language constructs to make hurtful, derogatory, and obscene comments. It is nearly impossible to construct a universal set of offensive or taboo terms because offensiveness is determined entirely by different factors such as socio–physical setting, speaker-listener relationship, and word choices. In this paper, we present a detailed corpus-based study of offensive language in Nepali. We identify and describe more than 18 different categories of linguistic offenses including politics, religion, race, and sex. We discuss 12 common euphemisms such as synonym, metaphor and circumlocution. In addition, we introduce a manually constructed data set of over 1000 offensive and taboo terms popular among contemporary speakers. We describe the first experiments that provide baseline results in detecting offensive language in Nepali. This in-depth study of offensive language and resource will provide a foundation for several downstream tasks such as offensive language detection and language learning.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Diwa Koirala

Offensive Language Detection in Nepali Social Media

Linguistic Taboos and Euphemisms in Nepali

Contact Info

Product

Resources

About