2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) 2018
DOI: 10.1109/icacci.2018.8554501
|View full text |Cite
|
Sign up to set email alerts
|

Word Level Language Identification in Code-Mixed Data using Word Embedding Methods for Indian Languages

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
9
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 15 publications
(10 citation statements)
references
References 3 publications
0
9
0
Order By: Relevance
“…Generally speaking, the recognition effect is better, referring to previous studies on language recognition (Burget, Matejka, & Cernocky, 2006;Campbell, Richardson, & Reynolds, 2007;Dehak et al, 2009;Mukherjee et al, 2018;Chaitanya et al, 2018), it is found that the recognition effect is better. The reason may be that the five languages selected in the experiment have large pronunciation differences, which is easier.…”
Section: Results Analysismentioning
confidence: 85%
“…Generally speaking, the recognition effect is better, referring to previous studies on language recognition (Burget, Matejka, & Cernocky, 2006;Campbell, Richardson, & Reynolds, 2007;Dehak et al, 2009;Mukherjee et al, 2018;Chaitanya et al, 2018), it is found that the recognition effect is better. The reason may be that the five languages selected in the experiment have large pronunciation differences, which is easier.…”
Section: Results Analysismentioning
confidence: 85%
“…Veena et al [47] utilised a linear kernel SVM classifier and could achieve an accuracy of 93% for word-level Malayalam-English and 95% for Tamil-English code-mixed LID. Chaitanya et al [48] incorporated several machine learning methods with Word2Vec embedding for Hindi-English. Based on their experiments, the SVM using Skip-gram reached the highest accuracy of 67.34%.…”
Section: 1) Machine Learning Approachmentioning
confidence: 99%
“…In recent years, lot of research on language identification is done, which essentially is the first step in NLP systems, although less work is done where it involves detection of multiple Indian languages. Inumella Chaitanya et al describe how common word embeddings like Continuous Bag of Words (CBOW) and Skip Grams models can be used to generate embeddings that can be feed to common machine learning models like support vector machine, Logistic Regression and K-Nearest neighbors among other algorithms [8]. Anupam Jamatia et al in their paper describe about two models i.e., Bi-LSTM classifier and Conditional Random Fields (CRF) classifier and suggest that Bi-LSTM classifier performs better.…”
Section: Related Workmentioning
confidence: 99%