Proceedings of the Second Workshop on Computational Approaches to Code Switching 2016
DOI: 10.18653/v1/w16-5810
|View full text |Cite
|
Sign up to set email alerts
|

Unraveling the English-Bengali Code-Mixing Phenomenon

Abstract: Code-mixing is a prevalent phenomenon in modern day communication. Though several systems enjoy success in identifying a single language, identifying languages of words in code-mixed texts is a herculean task, more so in a social media context. This paper explores the English-Bengali code-mixing phenomenon and presents algorithms capable of identifying the language of every word to a reasonable accuracy in specific cases and the general case. We create and test a predictorcorrector model, develop a new code-mi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(14 citation statements)
references
References 6 publications
(6 reference statements)
0
12
0
Order By: Relevance
“…Transliteration can be considered as a special form of code-mixing where the phonetic transformations of the words from a source language to a target language is performed. The presence of code-mixing and transliterated Bengali (i.e., Bengali text using the Latin alphabet) is a common phenomenon in Bengali, as shown by the previous studies (Barman et al, 2014;Chanda et al, 2016).…”
Section: Introductionmentioning
confidence: 75%
“…Transliteration can be considered as a special form of code-mixing where the phonetic transformations of the words from a source language to a target language is performed. The presence of code-mixing and transliterated Bengali (i.e., Bengali text using the Latin alphabet) is a common phenomenon in Bengali, as shown by the previous studies (Barman et al, 2014;Chanda et al, 2016).…”
Section: Introductionmentioning
confidence: 75%
“…It is required to take the surrounding words into consideration in order to get a sense and context information in identifying the word (A. Chanda,et al,,2016)…”
Section: Ambiguous Wordsmentioning
confidence: 99%
“…We see our work as a part of this continuing trend and as an important resource contribution to analyze Indian social media. Romanized Indian Languages: In the context of processing Indian languages expressed on the web, challenges posed by the use of Roman script instead of the native script have been reported in several recent studies in the context of code-mixed English-Bengali (Chanda et al, 2016), and English-Hindi (Kumar et al, 2018) text. While addressing word level language identification, reported that 90% of posts in Indian languages on Facebook are expressed in Roman script.…”
Section: Related Workmentioning
confidence: 99%