S. Anbukkarasi scite author profile

Spell checker is the application, which helps in finding the spelling errors in a given text. Applications like word processors, mails, search engines, speech recognition and social media forums need these kinds of spell checking tools to increase the correctness of the system. Spell checking is completely implemented in languages such as English, French, and Chinese. But as far as Indian regional languages is concerned, very few works have been carried out, that too partially. Tamil is one such Indian regional language, which requires a fully implemented spell checking application as many people started using this language in social media platforms like Facebook and Twitter. Spelling errors fall on different categories in Tamil language, which involves Sandhi errors, Homophone errors (Mayangoli), and misspelt words error. To tackle all these errors, a new ensemble approach is proposed in this paper. The proposed approach consists of Levenshtein's edit distance algorithm, rule-based algorithm, Soundex algorithm along with LSTM (Long Short Term Memory) model. We have used a special feature called combine character splitting of Tamil alphabets for feeding the LSTM model to improve the performance of the system. Proposed system produced an accuracy of 95.67%, which is approved by the Tamil scholar.

show abstract

Bulletin of the Polish Academy of Sciences: Technical Sciences

Anbukkarasi¹,

Varadhaganapathy²

2021

View full text Add to dashboard Cite

This paper addresses the problem of part of speech (POS) tagging for the Tamil language, which is low resourced and agglutinative. POS tagging is the process of assigning syntactic categories for the words in a sentence. This is the preliminary step for many of the Natural Language Processing (NLP) tasks. For this work, various sequential deep learning models such as recurrent neural network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) and Bi-directional Long Short-Term Memory (Bi-LSTM) were used at the word level. For evaluating the model, the performance metrics such as precision, recall, F1-score and accuracy were used. Further, a tag set of 32 tags and 225 000 tagged Tamil words was utilized for training. To find the appropriate hidden state, the hidden states were varied as 4, 16, 32 and 64, and the models were trained. The experiments indicated that the increase in hidden state improves the performance of the model. Among all the combinations, Bi-LSTM with 64 hidden states displayed the best accuracy (94%). For Tamil POS tagging, this is the initial attempt to be carried out using a deep learning model.

show abstract

TamilEmo: Fine-grained Emotion Detection Dataset for Tamil

Vasantharajan

Priyadharshini²,

Kumaresan

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

S. Anbukkarasi

Deep Learning-based Hate Speech Detection in Code-mixed Tamil Text

Hybrid Tamil spell checker with combined character splitting

Bulletin of the Polish Academy of Sciences: Technical Sciences

TamilEmo: Fine-grained Emotion Detection Dataset for Tamil

Contact Info

Product

Resources

About