Proceedings of the 2018 EMNLP Workshop W-Nut: The 4th Workshop on Noisy User-Generated Text 2018
DOI: 10.18653/v1/w18-6116
|View full text |Cite
|
Sign up to set email alerts
|

Language Identification in Code-Mixed Data using Multichannel Neural Networks and Context Capture

Abstract: An accurate language identification tool is an absolute necessity for building complex NLP systems to be used on code-mixed data. Lot of work has been recently done on the same, but there's still room for improvement. Inspired from the recent advancements in neural network architectures for computer vision tasks, we have implemented multichannel neural networks combining CNN and LSTM for word level language identification of code-mixed data. Combining this with a Bi-LSTM-CRF context capture module, accuracies … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(17 citation statements)
references
References 15 publications
0
14
0
Order By: Relevance
“…An interesting research direction is token-level LID for code-mixed texts (Zhang et al 2018;Mager, Çetinoglu, and Kann 2019;Mandal and Singh 2018). However, fine-grained LID has marginal assistance for CLIR task, since the downstream modules, e.g.…”
Section: Query Language Identificationmentioning
confidence: 99%
“…An interesting research direction is token-level LID for code-mixed texts (Zhang et al 2018;Mager, Çetinoglu, and Kann 2019;Mandal and Singh 2018). However, fine-grained LID has marginal assistance for CLIR task, since the downstream modules, e.g.…”
Section: Query Language Identificationmentioning
confidence: 99%
“…Recent works [40] have proposed diverse paths for WLLI. One of them used a multichannel [four channels] neural network, which included three convolutional 1-D networks and a LSTM network; it was tested on Bengali-English and Hindi-English language pairs.…”
Section: Related Workmentioning
confidence: 99%
“…The model achieved impressive accuracy scores of 93.28% on the Bengali data set, and 93.32% on the Hindi data set. Another approach used character encoding and root phone encoding [40] to train LSTM models. Jamatia et al [41] addressed WLLI of codemixed language pairs -English-Hindi and Bengali-Englishfrom Facebook, Twitter, WhatsApp social media sites.…”
Section: Related Workmentioning
confidence: 99%
“…Sequence classification using RNN has been used in Samih et al (2016). A multichannel convolutional neural network (CNN) combined with a bidirectional long short-term memory (BiLSTM)-CRF module has been used in Mandal and Singh (2018). Sequence to sequence models has also been used in Jurgens et al (2017).…”
Section: Related Studiesmentioning
confidence: 99%