Semi-Supervised Learning on Meta Structure: Multi-Task Tagging and Parsing in Low-Resource Scenarios

Lim, KyungTae; Lee, Jay Yoon; Carbonell, Jaime G.; Poibeau, Thierry

doi:10.1609/aaai.v34i05.6351

Cited by 20 publications

(22 citation statements)

References 19 publications

(12 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They include models trained on other grammar formalisms to improve dependency parsing on Twitter (Foster et al, 2011). Recently, this line of classics has been revisited (Ruder and Plank, 2018;Rotman and Reichart, 2019;Lim et al, 2020). For example, classic methods such as tri-training constitute a strong baseline for domain shift in neural times (Ruder and Plank, 2018).…”

Section: Pseudo-labelingmentioning

confidence: 99%

Neural Unsupervised Domain Adaptation in NLP—A Survey

Ramponi¹,

Plank²

2020

Proceedings of the 28th International Conference on Computational Linguistics

135

108

View full text Add to dashboard Cite

Deep neural networks excel at learning from labeled data and achieve state-of-the-art results on a wide array of Natural Language Processing tasks. In contrast, learning from unlabeled data, especially under domain shift, remains a challenge. Motivated by the latest advances, in this survey we review neural unsupervised domain adaptation techniques which do not require labeled target domain data. This is a more challenging yet a more widely applicable setup. We outline methods, from early traditional non-neural methods to pre-trained model transfer. We also revisit the notion of domain, and we uncover a bias in the type of Natural Language Processing tasks which received most attention. Lastly, we outline future directions, particularly the broader need for out-of-distribution generalization of future NLP. 1

show abstract

Section: Pseudo-labelingmentioning

confidence: 99%

Neural Unsupervised Domain Adaptation in NLP—A Survey

Ramponi¹,

Plank²

2020

Proceedings of the 28th International Conference on Computational Linguistics

135

108

View full text Add to dashboard Cite

show abstract

“…Compared with taggers that used BERT-like models, our join model shows slightly better performances. Even though Co-meta applied both the meta-LSTM and the sentence-based character model [15], our join model that applies two character models showed higher performances. However, it should be noted that the udfy model was first trained with 75 different languages and then tuned for English [11].…”

Section: Resultsmentioning

confidence: 85%

“…In POS tagging, multilingual BERT 6 that can handle 100 languages is applied to train a multilingual POS tagger, namely, udify [11] in 2019. More recently, Lim et al [15] proposed a Co-meta tagger that leverages its performance based on a semi-supervised learning approach with the multilingual BERT and BERT-base monolingual English model, and they achieved SOTA results. We compare the performance of our tagger with BERT-like models as existing SOTA systems, in particular, with multilingual BERT, BERT-base, and Roberta [35].…”

Section: Resultsmentioning

confidence: 99%

“…From a technical point of view, ELMo can be used as a function that provides a deep contextualized word representation based on the entire input sentence using a pretrained LSTM. Recently, BERT [10] trained by bidirectional transformers with a masked language model strategy showed good results in tagging and parsing [11], [15], [27]. As an external word view (representation), we use both ELMo and BERT representations on our tagger.…”

Section: Deep Contextualized Language Modelsmentioning

confidence: 99%

“…To address this issue, one of the most promising approaches is to adapt meta-LSTMs on top of several subword models. Following on the previously proposed joint methods that promote the multiview approach with meta-LSTMs [14], [15], we propose a joint learning method for the better application of several subword models. This method can be trained using two novel lexical representations:…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Part-of-Speech Tagging Using Multiview Learning

Lim

Park

2020

IEEE Access

Self Cite

View full text Add to dashboard Cite

In natural language processing, character-level representations are vector representations of the particular character. Character-level representations have recently focused on enriching subword information by stacking deep neural models. Ideally, applications of several character-level representations can help capture different aspects of the subword information. However, this approach has often failed in the past, mainly because of the nature of traditionally used simple concatenation models. In this study, we explore different character-level modeling techniques. During the learning process, long short-term memory-based character representations can introduce different views for a part-of-speech tagger. After investigating two previously reported techniques, we propose two additional extended methods: (1) a multihead-attention character-level representation for capturing several aspects of subword information, and (2) an optimal structure for training two different character-level embeddings based on joint learning. We evaluate our results on the part-of-speech (POS) tagging dataset of the Conference on Natural Language Learning (CoNLL) 2018 shared task in universal dependencies. We show that our method substantially improves POS tagging results for many morphologically rich languages where the character information should be considered more substantially. Moreover, we compare the performance of our model with recent state-of-the-art POS taggers, which are trained with language models such as Bidirectional Encoder Representations from Transformers (BERT) and Deep Contextualized Word Representations (ELMo); our multiview tagger shows better results for nine languages. The proposed character model shows significant improvements in Ancient Greek, with average gains of 8.89 points in accuracy compared to the previous word representation model. Therefore, our empirical experiments indicate that character-level representations are more important than word representations for morphologically rich languages in terms of performance. INDEX TERMS Part-of-speech tagging, multiview learning, character-level representation, neural networks, natural language processing.

show abstract

Leveraging Text Classification by Co-training with Bidirectional Language Models – A Novel Hybrid Approach and Its Application for a German Bank

Graef

2021

Lecture Notes in Information Systems and Organisation

View full text Add to dashboard Cite

Semi-Supervised Learning on Meta Structure: Multi-Task Tagging and Parsing in Low-Resource Scenarios

Cited by 20 publications

References 19 publications

Neural Unsupervised Domain Adaptation in NLP—A Survey

Neural Unsupervised Domain Adaptation in NLP—A Survey

Part-of-Speech Tagging Using Multiview Learning

Leveraging Text Classification by Co-training with Bidirectional Language Models – A Novel Hybrid Approach and Its Application for a German Bank

Contact Info

Product

Resources

About