DagoBERT: Generating Derivational Morphology with a Pretrained Language Model

Hofmann, Valentin; Pierrehumbert, Janet B.; Schütze, Hinrich

doi:10.18653/v1/2020.emnlp-main.316

Cited by 16 publications

(20 citation statements)

References 33 publications

(24 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It is divided into smaller communities, so-called subreddits, which have been shown to be a rich source of derivationally complex words (Hofmann et al, 2020c). Hofmann et al (2020a) have published a dataset of derivatives found on Reddit annotated with the subreddits in which they occur. 8 Inspired by a content-based subreddit categorization scheme, 9 we define two groups of subreddits, an entertainment set (ent) consisting of the subreddits anime, DestinyTheGame, funny, Games, gaming, leagueoflegends, movies, Music, pics, and videos, as well as a discussion set (dis) consisting of the subred-8 https://github.com/valentinhofmann/ dagobert 9 https://www.reddit.com/r/ TheoryOfReddit/comments/1f7hqc/the_200_ most_active_subreddits_categorized_by dits askscience, atheism, conspiracy, news, Libertarian, politics, science, technology, TwoXChromosomes, and worldnews, and extract all derivationally complex words occurring in them.…”

Section: Datamentioning

confidence: 99%

“…The specific BERT variant we use is BERT BASE (uncased) (Devlin et al, 2019). For the derivational segmentation, we follow previous work by Hofmann et al (2020a) in separating stem and prefixes by a hyphen. We further follow Casanueva et al (2020) and in mean-pooling the output representations for all subwords, excluding BERT's special tokens.…”

Section: Modelsmentioning

confidence: 99%

“…, Deutsch et al (2018) propose sequence-to-sequence models for the generation of derivationally complex words. Hofmann et al (2020a) address the same task using BERT. In contrast, we analyze how different input segmentations affect the semantic representations of derivationally complex words in PLMs, a question that has not been addressed before.…”

Section: Related Workmentioning

confidence: 99%

“…One common characteristic of PLMs is their input segmentation: PLMs are based on fixed-size vocabularies of words and subwords that are generated by compression algorithms such as bytepair encoding (Gage, 1994;Sennrich et al, 2016) and WordPiece (Schuster and Nakajima, 2012;Wu et al, 2016). The segmentations produced by these algorithms are linguistically questionable at times (Church, 2020), which has been shown to worsen performance on certain downstream tasks (Bostrom and Durrett, 2020;Hofmann et al, 2020a). However, the wider implications of these findings, particularly with regard to the generalization capabilities of PLMs, are still poorly understood.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Hofmann¹,

Pierrehumbert²,

Schuze³

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words? We present the first study investigating this question, taking BERT as the example PLM and focusing on its semantic representations of English derivatives. We show that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the best generalization on new words. This hypothesis is confirmed by a series of semantic probing tasks on which Del-BERT (Derivation leveraging BERT), a model with derivational input segmentation, substantially outperforms BERT with WordPiece segmentation. Our results suggest that the generalization capabilities of PLMs could be further improved if a morphologically-informed vocabulary of input tokens were used.

show abstract

Section: Datamentioning

confidence: 99%

Section: Modelsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Hofmann¹,

Pierrehumbert²,

Schuze³

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

show abstract

“…Bostrom and Durrett (2020) argue that byte-pair encoding less faithfully expresses English morphology than Unigram segmentation, and show a performance improvement in downstream tasks with a unigramsegmentation-based BERT model. Hofmann et al (2020) show that BERT can be fine-tuned with a classification layer to complete a derivational morphology cloze task, finding that imposing morpheme boundaries with hyphenation on the input side ultimately improved BERT's performance at this task. Finally, Edmiston (2020) investigates several monolingual BERT models for representations of morphological information.…”

Section: Bert and Linguistic Competencementioning

confidence: 96%

This is a BERT. Now there are several of them. Can they generalize to novel words?

Haley¹

2020

Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

Recently, large-scale pre-trained neural network models such as BERT have achieved many state-of-the-art results in natural language processing. Recent work has explored the linguistic capacities of these models. However, no work has focused on the ability of these models to generalize these capacities to novel words. This type of generalization is exhibited by humans (Berko, 1958), and is intimately related to morphology-humans are in many cases able to identify inflections of novel words in the appropriate context. This type of morphological capacity has not been previously tested in BERT models, and is important for morphologically-rich languages, which are under-studied in the literature regarding BERT's linguistic capacities. In this work, we investigate this by considering monolingual and multilingual BERT models' abilities to agree in number with novel plural words in English, French, German, Spanish, and Dutch. We find that many models are not able to reliably determine plurality of novel words, suggesting potential deficiencies in the morphological capacities of BERT models.

show abstract

The MorPhEMe Machine: An Addressable Neural Memory for Learning Knowledge-Regularized Deep Contextualized Chinese Embedding

Quan,

Vong,

Zeng

et al. 2024

IEEE/ACM Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

DagoBERT: Generating Derivational Morphology with a Pretrained Language Model

Cited by 16 publications

References 33 publications

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

This is a BERT. Now there are several of them. Can they generalize to novel words?

The MorPhEMe Machine: An Addressable Neural Memory for Learning Knowledge-Regularized Deep Contextualized Chinese Embedding

Contact Info

Product

Resources

About