Social media, a buzz term in the modern world, refers to various online platforms like social networks, forums, blogs and blog comments, microblogs, wikis, media sharing platforms, social bookmarks through which communication between individuals, communities, or groups takes place. People over social media do not only share their ideas and opinions, but it has become an important source through which businesses promote their products. Analyzing huge data generated over social media is useful in various tasks like analyzing customer trends, forecast sales, understanding opinions of people on different hot topics, views of customers about services/products, and many more. Different natural language processing (NLP) techniques are used for crawling and processing social media data to get useful insights out of this. In this chapter, the focus is on various NLP techniques used to process the social media data. Challenges faced by NLP techniques to process social media data are also put forward in this chapter.
Objectives:In this research work maiden attempt is made towards developing a sense annotated corpus for Kashmiri Lexical Sample Word Sense Disambiguation (WSD). Sense annotated dataset is required to use Supervised WSD techniques which are the most effective techniques to carry out WSD. As developing a sense-tagged dataset is an arduous task such datasets are not available for all natural languages. Kashmiri being computationally a lowresource language does not have a sense-tagged corpus available for research purposes. Methods: To develop the sense annotated dataset we selected 60 commonly used ambiguous Kashmiri words and annotated the dataset using the manual annotation method. The usefulness of the dataset is also examined by implementing machine learning algorithms (k-NN, Decision Tree (DT) and Support Vector Machine (SVM)) on it. Part of Speech (PoS) and Bag of Words (BoW) features are used to train the classifiers. Findings: The performance of the machine learning algorithms for Kashmiri WSD is evaluated using accuracy metric. Out of the different classifiers used SVM showed the best performance with an average accuracy of 75.74%. Novelty: This research is the first attempt to develop a sense-tagged dataset for Kashmiri language. The developed dataset would be of great importance to the research community and can be used in various Natural Language Processing tasks like WSD, part-of-speech tagging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.