Part-of-speech Tagging of Code-Mixed Social Media Text

Ghosh, Souvick; Ghosh, Satanu; Das, Dipankar

doi:10.18653/v1/w16-5811

Cited by 27 publications

(15 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our choice of tasks is primarily motivated by the availability of annotated CM data. There has been prior work on CM sentiment identification (Vilares and Alonso, 2016;Rudra et al, 2016; and POS tagging (Solorio and Liu, 2008;AlGhamdi et al, 2016;Ghosh et al, 2016). But we are not aware of any work that utilizes pre-trained bilingual embeddings for these tasks.…”

Section: Discussionmentioning

confidence: 99%

Word Embeddings for Code-Mixed Language Processing

Pratapa¹,

Choudhury²,

Sitaram³

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

We compare three existing bilingual word embedding approaches, and a novel approach of training skip-grams on synthetic code-mixed text generated through linguistic models of code-mixing, on two tasks-sentiment analysis and POS tagging for code-mixed text. Our results show that while CVM and CCA based embeddings perform as well as the proposed embedding technique on semantic and syntactic tasks respectively, the proposed approach provides the best performance for both tasks overall. Thus, this study demonstrates that existing bilingual embedding techniques are not ideal for code-mixed text processing and there is a need for learning multilingual word embedding from the code-mixed text.

show abstract

Section: Discussionmentioning

confidence: 99%

Word Embeddings for Code-Mixed Language Processing

Pratapa¹,

Choudhury²,

Sitaram³

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…The work on studying the types of conceptual structures and their language correspondences started within the framework of exploring artificial intelligence [6,7]. The result of this work was the model of "conceptual dependences".…”

Section: Literature Review and Problem Statementmentioning

confidence: 99%

Development of knowledgeoriented system of machine translation based on the analyticsynthetic text processing

Lytvynenko

Nikolaievskyi

Lakhno

et al. 2017

EEJET

View full text Add to dashboard Cite

Розроблено методи автоматичного аналізу тексту на основі декларативного представлення правил синтаксичної сполучуваності та програмного розподілення аналітико-синтетичної обробки природно-мовного тексту в системах машинного перекладу. Програмна реалізація експерементально доводить, що застосування розроблених методів зменшує кількість помилок семантичного характеру в середньому на 14-16 % у порівнянні з відомими системами машинного перекладу Ключові слова: система машиного перекладу, автоматичний аналіз тексту, аналітико-синтетична обробка тексту Разработаны методы автоматического анализа текста на основе декларативного представления правил синтаксической соединяемости и программного распределения аналитико-синтетической обработки естественно-языкового текста в системах машинного перевода. Програмная реализация експерементально подтверждает, что применение разработанных методов уменьшает количество ошибок семантического характера в среднем на 14-16 % по сравнению с известными системами машинного перевода Ключевые слова: система машинного перевода, автоматический анализ текста, аналитико-синтетической обработка текста

show abstract

“…As is typically the case in NLP, such pipelines suffer from the problem of cascading errors; e.g., failures of the language identification will cause problems in the tag prediction (Barman et al, 2016). Other approaches have trained supervised models on POS-annotated, code-switched data (Jamatia et al, 2015;Ghosh et al, 2016;Gupta et al, 2017;Barman et al, 2016;Sequiera et al, 2015, inter alia), resources which are expensive to create and unavailable for most language pairs.…”

Section: Introductionmentioning

confidence: 99%

Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification

Ball¹,

Garrette²

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Code-switching, the use of more than one language within a single utterance, is ubiquitous in much of the world, but remains a challenge for NLP largely due to the lack of representative data for training models. In this paper, we present a novel model architecture that is trained exclusively on monolingual resources, but can be applied to unseen codeswitched text at inference time. The model accomplishes this by jointly maintaining separate word representations for each of the possible languages-or scripts in the case of transliteration-allowing each to contribute to inferences without forcing the model to commit to a language. Experiments on Hindi-English part-of-speech tagging demonstrate that our approach outperforms standard models when training on monolingual text without transliteration, and testing on code-switched text with alternate scripts.

show abstract

Part-of-speech Tagging of Code-Mixed Social Media Text

Cited by 27 publications

References 22 publications

Word Embeddings for Code-Mixed Language Processing

Word Embeddings for Code-Mixed Language Processing

Development of knowledgeoriented system of machine translation based on the analyticsynthetic text processing

Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification

Contact Info

Product

Resources

About

Part-of-speech Tagging of Code-Mixed Social Media Text

Cited by 27 publications

References 22 publications

Word Embeddings for Code-Mixed Language Processing

Word Embeddings for Code-Mixed Language Processing

Development of knowledge­oriented system of machine translation based on the analytic­synthetic text processing

Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification

Contact Info

Product

Resources

About

Development of knowledgeoriented system of machine translation based on the analyticsynthetic text processing