Abstract:Linguistic resources for commonly used languages such as English and Mandarin Chinese are available in abundance, hence the existing research in these languages. However, there are languages for which linguistic resources are scarcely available. One of these languages is the Hindi language. Hindi, being the fourth-most popular language, still lacks in richly populated linguistic resources, owing to the challenges involved in dealing with the Hindi language. This article first explores the machine learning-base… Show more
“…Gupta et al ( 2021 ) also handle the problem of low resource languages, and they consider the issue of the Hindi Language as prime. They discussed the issues with the Hindi language, such as spelling variations due to many dialectics, co-reference resolution, and many more.…”
“…Gupta et al ( 2021 ) also handle the problem of low resource languages, and they consider the issue of the Hindi Language as prime. They discussed the issues with the Hindi language, such as spelling variations due to many dialectics, co-reference resolution, and many more.…”
“…Supervised learning is a method used to prepare a set of decision-making rules that can help predict a known outcome (Gupta et al , 2021). These rules can be cited as examples.…”
Purpose
The current natural language processing algorithms are still lacking in judgment criteria, and these approaches often require deep knowledge of political or social contexts. Seeing the damage done by the spreading of fake news in various sectors have attracted the attention of several low-level regional communities. However, such methods are widely developed for English language and low-resource languages remain unfocused. This study aims to provide analysis of Hindi fake news and develop a referral system with advanced techniques to identify fake news in Hindi.
Design/methodology/approach
The technique deployed in this model uses bidirectional long short-term memory (B-LSTM) as compared with other models like naïve bayes, logistic regression, random forest, support vector machine, decision tree classifier, kth nearest neighbor, gated recurrent unit and long short-term models.
Findings
The deep learning model such as B-LSTM yields an accuracy of 95.01%.
Originality/value
This study anticipates that this model will be a beneficial resource for building technologies to prevent the spreading of fake news and contribute to research with low resource languages.
“…A convolutional neural network (CNN) is a deep learning network structure that is more suitable for the information stored in the array's data structure. Like other neural network structures, CNN comprises an input layer, the memory stack of pooling and convolutional layers for extracting feature sets, and then a fully connected layer with a softmax classifier in the classification layer [64][65][66][67][68].…”
Natural language processing (NLP) tools have sparked a great deal of interest due to rapid improvements in information and communications technologies. As a result, many different NLP tools are being produced. However, there are many challenges for developing efficient and effective NLP tools that accurately process natural languages. One such tool is part of speech (POS) tagging, which tags a particular sentence or words in a paragraph by looking at the context of the sentence/words inside the paragraph. Despite enormous efforts by researchers, POS tagging still faces challenges in improving accuracy while reducing false-positive rates and in tagging unknown words. Furthermore, the presence of ambiguity when tagging terms with different contextual meanings inside a sentence cannot be overlooked. Recently, Deep learning (DL) and Machine learning (ML)-based POS taggers are being implemented as potential solutions to efficiently identify words in a given sentence across a paragraph. This article first clarifies the concept of part of speech POS tagging. It then provides the broad categorization based on the famous ML and DL techniques employed in designing and implementing part of speech taggers. A comprehensive review of the latest POS tagging articles is provided by discussing the weakness and strengths of the proposed approaches. Then, recent trends and advancements of DL and ML-based part-of-speech-taggers are presented in terms of the proposed approaches deployed and their performance evaluation metrics. Using the limitations of the proposed approaches, we emphasized various research gaps and presented future recommendations for the research in advancing DL and ML-based POS tagging.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.