Aadil Ahmad Lawaye scite author profile

Social media, a buzz term in the modern world, refers to various online platforms like social networks, forums, blogs and blog comments, microblogs, wikis, media sharing platforms, social bookmarks through which communication between individuals, communities, or groups takes place. People over social media do not only share their ideas and opinions, but it has become an important source through which businesses promote their products. Analyzing huge data generated over social media is useful in various tasks like analyzing customer trends, forecast sales, understanding opinions of people on different hot topics, views of customers about services/products, and many more. Different natural language processing (NLP) techniques are used for crawling and processing social media data to get useful insights out of this. In this chapter, the focus is on various NLP techniques used to process the social media data. Challenges faced by NLP techniques to process social media data are also put forward in this chapter.

show abstract

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Mir¹,

Lawaye²,

Rana³

et al. 2023

IJST

View full text Add to dashboard Cite

Objectives:In this research work maiden attempt is made towards developing a sense annotated corpus for Kashmiri Lexical Sample Word Sense Disambiguation (WSD). Sense annotated dataset is required to use Supervised WSD techniques which are the most effective techniques to carry out WSD. As developing a sense-tagged dataset is an arduous task such datasets are not available for all natural languages. Kashmiri being computationally a lowresource language does not have a sense-tagged corpus available for research purposes. Methods: To develop the sense annotated dataset we selected 60 commonly used ambiguous Kashmiri words and annotated the dataset using the manual annotation method. The usefulness of the dataset is also examined by implementing machine learning algorithms (k-NN, Decision Tree (DT) and Support Vector Machine (SVM)) on it. Part of Speech (PoS) and Bag of Words (BoW) features are used to train the classifiers. Findings: The performance of the machine learning algorithms for Kashmiri WSD is evaluated using accuracy metric. Out of the different classifiers used SVM showed the best performance with an average accuracy of 75.74%. Novelty: This research is the first attempt to develop a sense-tagged dataset for Kashmiri language. The developed dataset would be of great importance to the research community and can be used in various Natural Language Processing tasks like WSD, part-of-speech tagging.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aadil Ahmad Lawaye

Kashmir Part of Speech Tagger Using CRF

NLP Techniques and Challenges to Process Social Media Data

Building Kashmiri Sense Annotated Corpus and its Usage in Supervised Word Sense Disambiguation

Contact Info

Product

Resources

About