Proceedings of the 2019 Conference of the North 2019
DOI: 10.18653/v1/n19-1252
|View full text |Cite
|
Sign up to set email alerts
|

A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages

Abstract: Unsupervised part of speech (POS) tagging is often framed as a clustering problem, but practical taggers need to ground their clusters as well. Grounding generally requires reference labeled data, a luxury a low-resource language might not have. In this work, we describe an approach for low-resource unsupervised POS tagging that yields fully grounded output and requires no labeled training data. We find the classic method of Brown et al. (1992) clusters well in our use case and employ a decipherment-based ap… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 28 publications
(34 reference statements)
0
8
0
Order By: Relevance
“…• Textual Modality: Within text, there are several datasets for tasks like content transfer (Prabhumoye et al, 2019), commonsense inference (Zellers et al, 2018), reference resolution (Kennington and Schlangen, 2015), symbol grounding (Kameko et al, 2015), studying linguistic and non-linguistic contexts in microblogs (Doyle and Frank, 2015), bilingual lexicon extraction (Laws et al, 2010), universal part-of-speech tagging for low resource languages (Cardenas et al, 2019), entity linking and reference (Nothman et al, 2012) etc.,…”
Section: New Datasetsmentioning
confidence: 99%
See 1 more Smart Citation
“…• Textual Modality: Within text, there are several datasets for tasks like content transfer (Prabhumoye et al, 2019), commonsense inference (Zellers et al, 2018), reference resolution (Kennington and Schlangen, 2015), symbol grounding (Kameko et al, 2015), studying linguistic and non-linguistic contexts in microblogs (Doyle and Frank, 2015), bilingual lexicon extraction (Laws et al, 2010), universal part-of-speech tagging for low resource languages (Cardenas et al, 2019), entity linking and reference (Nothman et al, 2012) etc.,…”
Section: New Datasetsmentioning
confidence: 99%
“…This is not an exhaustive study of all the techniques that present grounding, but are some of the representative categories. Here are more studies that perform grounding with various techniques such as clustering (Shutova et al, 2015;Cardenas et al, 2019) regularization (Shrestha et al, 2020), CRFs , classification (Pangburn et al, 2003;Monroe et al, 2017), linguistic theories (Strube and Hahn, 1999), iterative refinement (Li et al, 2019;, language modeling (Spithourakis et al, 2016;Cho and May, 2020), nearest neighbors , contextual fusion (Chandu et al, 2019a), mutual information (Oates, 2003), cycle consistency (Zhong et al, 2020) etc.,…”
Section: Nuanced Modeling Variations For Groundingmentioning
confidence: 99%
“…The main fo cus of these libraries is script conversion and ro manization. In this capacity they were success fully employed in diverse downstream multilin gual NLP tasks such as neural machine transla tion (Zhang et al, 2020; Amrhein andSennrich, 2020), morphological analysis (Hauer et al, 2019; Murikinati et al, 2020, named entity recogni tion (Huang et al, 2019) and partofspeech tag ging (Cardenas et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…Another form of resource-constrained NLP is with low-resource languages or datasets. Research into unsupervised PoS tagging on low resource languages has been done by Buys and Botha (2016); Cardenas et al (2019). Ezen-Can (2020) evaluated the performance of BERT on a small dataset.…”
Section: Related Workmentioning
confidence: 99%