Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08 2008
DOI: 10.3115/1613715.1613852
|View full text |Cite
|
Sign up to set email alerts
|

Part-of-speech tagging for English-Spanish code-switched text

Abstract: Code-switching is an interesting linguistic phenomenon commonly observed in highly bilingual communities. It consists of mixing languages in the same conversational event. This paper presents results on Part-of-Speech tagging Spanish-English code-switched discourse. We explore different approaches to exploit existing resources for both languages that range from simple heuristics, to language identification, to machine learning. The best results are achieved by training a machine learning algorithm with feature… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
107
0
1

Year Published

2011
2011
2019
2019

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 94 publications
(110 citation statements)
references
References 27 publications
2
107
0
1
Order By: Relevance
“…This is also reflected in the abundant literature that uses Spanish HMMsbased taggers (e.g., [16][17][18][19], such as Schmid's (13) and Padró (20). In contrast, with a more recent history in language applications in general, MaxEnt models have been applied to Spanish more lately.…”
Section: Automatic Pos Tagging In Spanishmentioning
confidence: 96%
“…This is also reflected in the abundant literature that uses Spanish HMMsbased taggers (e.g., [16][17][18][19], such as Schmid's (13) and Padró (20). In contrast, with a more recent history in language applications in general, MaxEnt models have been applied to Spanish more lately.…”
Section: Automatic Pos Tagging In Spanishmentioning
confidence: 96%
“…POS tagging is widely adopted for languages such as English, German, Spanish and Arabic [1]- [4]. It plays a significant role in text analysis as it is an initial step to identify the grammar information in the text.…”
Section: Related Workmentioning
confidence: 99%
“…Here, we will use CM to imply both. Work on computa- * This work was done during authors' internship at Microsoft Research India. tional models of CM have been few and far between (Solorio and Liu, 2008a;Solorio and Liu, 2008b;Nguyen and Dogruoz, 2013), primarily due to the paucity of CM data in conventional text-corpora which makes data-intensive methods hard to apply. Solorio and Liu (2008a) in their work on English-Spanish CM use models built on smaller datasets to predict valid switching points to synthetically generate data from monolingual corpora, and in another work (2008b) describe parts-of-speech (POS) tagging of CM text.…”
Section: Introductionmentioning
confidence: 99%