Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction

Gerz, Daniela; Vulić, Ivan; Ponti, Edoardo Maria; Naradowsky, Jason; Reichart, Roi; Korhonen, Anna

doi:10.1162/tacl_a_00032

Cited by 39 publications

(57 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Cotterell et al (2018) study 21 languages. Gerz et al (2018) create datasets for 50 languages. All of these studies, however, only create small datasets, which are inadequate for pretraining language models.…”

Section: Cross-lingual Pretrained Language Modelsmentioning

confidence: 99%

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Eisenschlos¹,

Ruder²,

Czapla³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Pretrained language models are promising particularly for low-resource languages as they only require unlabelled data. However, training existing models requires huge amounts of compute, while pretrained cross-lingual models often underperform on low-resource languages. We propose Multi-lingual language model Fine-Tuning (MultiFiT) to enable practitioners to train and fine-tune language models efficiently in their own language. In addition, we propose a zero-shot method using an existing pretrained cross-lingual model. We evaluate our methods on two widely used cross-lingual classification datasets where they outperform models pretrained on orders of magnitude more data and compute. We release all models and code 1 .

show abstract

Section: Cross-lingual Pretrained Language Modelsmentioning

confidence: 99%

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Eisenschlos¹,

Ruder²,

Czapla³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

show abstract

“…The first direction aims to obtain good embeddings for novel words by looking at their characters (Pinter, Guthrie, and Eisenstein 2017), morphemes (Lazaridou et al 2013;Luong, Socher, and Manning 2013;Cotterell, Schütze, and Eisner 2016) or n-grams (Wieting et al 2016;Bojanowski et al 2017;Ataman and Federico 2018;Salle and Villavicencio 2018). Naturally, this direction is especially well-suited for languages with rich morphology (Gerz et al 2018). The second, context-based direction tries to infer embeddings for novel words from the words surrounding them (Lazaridou, Marelli, and Baroni 2017;Herbelot and Baroni 2017;Khodak et al 2018).…”

Section: Introductionmentioning

confidence: 99%

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Schick

Schütze

2019

AAAI

View full text Add to dashboard Cite

Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data. The general problem setting is that word embeddings are induced on an unlabeled training corpus and then a model is trained that embeds novel words into this induced embedding space. Currently, two approaches for learning embeddings of novel words exist: (i) learning an embedding from the novel word's surface-form (e.g., subword n-grams) and (ii) learning an embedding from the context in which it occurs. In this paper, we propose an architecture that leverages both sources of information -surface-form and context -and show that it results in large increases in embedding quality. Our architecture obtains state-of-the-art results on the Definitional Nonce and Contextual Rare Words datasets. As input, we only require an embedding set and an unlabeled corpus for training our architecture to produce embeddings appropriate for the induced embedding space. Thus, our model can easily be integrated into any existing NLP system and enhance its capability to handle novel words.

show abstract

“…All benchmarked -gram LMs are 5-grams, with the exception of BKN which is an ∞-gram model trained via 5 samples 4 following the recipe of . GKN (Gerz et al, 2018a). Suit symbols denote morphological types: ♢ Isolating, ♡ Fusional, ♠ Agglutinative, ♣ Introflexive.…”

Section: Experiments and Resultsmentioning

confidence: 99%

Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines

Shareghi¹,

Gerz

Vulić

et al. 2019

Proceedings of the 2019 Conference of the North

Self Cite

View full text Add to dashboard Cite

In recent years neural language models (LMs) have set state-of-the-art performance for several benchmarking datasets. While the reasons for their success and their computational demand are well-documented, a comparison between neural models and more recent developments in-gram models is neglected. In this paper, we examine the recent progress in-gram literature, running experiments on 50 languages covering all morphological language families. Experimental results illustrate that a simple extension of Modified Kneser-Ney outperforms an LSTM language model on 42 languages while a word-level Bayesiangram LM (Shareghi et al., 2017) outperforms the character-aware neural model (Kim et al., 2016) on average across all languages, and its extension which explicitly injects linguistic knowledge (Gerz et al., 2018a) on 8 languages. Further experiments on larger Europarl datasets for 3 languages indicate that neural architectures are able to outperform computationally much cheaper-gram models:-gram training is up to 15, 000× quicker. Our experiments illustrate that standalone-gram models lend themselves as natural choices for resource-lean or morphologically rich languages, while the recent progress has significantly improved their accuracy.

show abstract

Language Modeling for Morphologically Rich Languages: Character-Aware Modeling for Word-Level Prediction

Cited by 39 publications

References 26 publications

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

MultiFiT: Efficient Multi-lingual Language Model Fine-tuning

Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

Show Some Love to Your n-grams: A Bit of Progress and Stronger n-gram Language Modeling Baselines

Contact Info

Product

Resources

About