Proceeding of the 6th Conference on Natural Language Learning - COLING-02 2002
DOI: 10.3115/1118853.1118859
|View full text |Cite
|
Sign up to set email alerts
|

Bootstrapping a multilingual part-of-speech tagger in one person-day

Abstract: This paper presents a method for bootstrapping a fine-grained, broad-coverage part-of-speech (POS) tagger in a new language using only one personday of data acquisition effort. It requires only three resources, which are currently readily available in 60-100 world languages: (1) an online or hardcopy pocket-sized bilingual dictionary, (2) a basic library reference grammar, and (3) access to an existing monolingual text corpus in the language. The algorithm begins by inducing initial lexical POS distributions f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
28
0

Year Published

2003
2003
2017
2017

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(28 citation statements)
references
References 9 publications
0
28
0
Order By: Relevance
“…There has been some previous work on boostrapping POS taggers (e.g., Zavrel and Daelemans (2000) and Cucerzan and Yarowsky (2002)), but to our knowledge no previous work on co-training POS taggers.…”
Section: Introductionmentioning
confidence: 99%
“…There has been some previous work on boostrapping POS taggers (e.g., Zavrel and Daelemans (2000) and Cucerzan and Yarowsky (2002)), but to our knowledge no previous work on co-training POS taggers.…”
Section: Introductionmentioning
confidence: 99%
“…Bootstrapping is used to create labelled training data from large amounts of unlabelled data (Cucerzan and Yarowsky, 2002).…”
Section: The Bootstrapping Methodsmentioning
confidence: 99%
“…2008; Oflazer et al. 2001), and manual encoding of basic linguistic facts (e.g., Cucerzan and Yarowsky 2002; Feldman and Hana 2010; Tepper and Xia 2010). Learning from a different language (e.g., Bosch et al.…”
Section: Introductionmentioning
confidence: 99%
“…Learning from a different language (e.g., Bosch et al. 2008; Cucerzan and Yarowsky 2002; Feldman and Hana 2010), another resource‐light strategy, will be discussed in our forthcoming survey (Feldman and Hana forthcoming).…”
Section: Introductionmentioning
confidence: 99%