Soumil Mandal scite author profile

Soumil Mandal

5Publications

36Citation Statements Received

34Citation Statements Given

How they've been cited

How they cite others

Affiliations

SRM University, SRM Institute of Science and Technology

Publications

Order By: Most citations

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

Mandal

Singh

2018

View full text Add to dashboard Cite

An accurate language identification tool is an absolute necessity for building complex NLP systems to be used on code-mixed data. Lot of work has been recently done on the same, but there's still room for improvement. Inspired from the recent advancements in neural network architectures for computer vision tasks, we have implemented multichannel neural networks combining CNN and LSTM for word level language identification of code-mixed data. Combining this with a Bi-LSTM-CRF context capture module, accuracies of 93.28% and 93.32% is achieved on our two testing sets.

show abstract

Language Identification of Bengali-English Code-Mixed data using Character & Phonetic based LSTM Models

Mandal¹,

Das²,

Das³

2018

Preprint

View full text Add to dashboard Cite

Language identification of social media text still remains a challenging task due to properties like code-mixing and inconsistent phonetic transliterations. In this paper, we present a supervised learning approach for language identification at the word level of low resource Bengali-English code-mixed data taken from social media. We employ two methods of word encoding, namely character based and root phone based to train our deep LSTM models. Utilizing these two models we created two ensemble models using stacking and threshold technique which gave 91.78% and 92.35% accuracies respectively on our testing data.

show abstract

Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models

Das

Mandal

Das

2019

View full text Add to dashboard Cite

Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

Mandal

Nanmaran²

2018

View full text Add to dashboard Cite

Building tools for code-mixed data is rapidly gaining popularity in the NLP research community as such data is exponentially rising on social media. Working with code-mixed data contains several challenges, especially due to grammatical inconsistencies and spelling variations in addition to all the previous known challenges for social media scenarios. In this article, we present a novel architecture focusing on normalizing phonetic typing variations, which is commonly seen in code-mixed data. One of the main features of our architecture is that in addition to normalizing, it can also be utilized for back-transliteration and word identification in some cases. Our model achieved an accuracy of 90.27% on the test data.

show abstract

Code-Mixed to Monolingual Translation Framework

Mahata

Mandal

Das

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Soumil Mandal

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

Language Identification of Bengali-English Code-Mixed data using Character & Phonetic based LSTM Models

Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models

Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance

Code-Mixed to Monolingual Translation Framework

Contact Info

Product

Resources

About