Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.279
|View full text |Cite
|
Sign up to set email alerts
|

Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words

Abstract: How does the input segmentation of pretrained language models (PLMs) affect their interpretations of complex words? We present the first study investigating this question, taking BERT as the example PLM and focusing on its semantic representations of English derivatives. We show that PLMs can be interpreted as serial dual-route models, i.e., the meanings of complex words are either stored or else need to be computed from the subwords, which implies that maximally meaningful input tokens should allow for the be… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
7
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 18 publications
(11 citation statements)
references
References 82 publications
(72 reference statements)
1
7
0
Order By: Relevance
“…In particular, character-level models capture complex structure in the space of words, pseudowords, and randomly generated ngrams. These findings are consistent with work suggesting that character-level and morpheme-aware representations are rich in meaning, even compared to word or sub-word models (Al-Rfou et al, 2019;El Boukkouri et al, 2020;Ma et al, 2020;Hofmann et al, 2020Hofmann et al, , 2021.…”
Section: Discussionsupporting
confidence: 89%
See 1 more Smart Citation
“…In particular, character-level models capture complex structure in the space of words, pseudowords, and randomly generated ngrams. These findings are consistent with work suggesting that character-level and morpheme-aware representations are rich in meaning, even compared to word or sub-word models (Al-Rfou et al, 2019;El Boukkouri et al, 2020;Ma et al, 2020;Hofmann et al, 2020Hofmann et al, , 2021.…”
Section: Discussionsupporting
confidence: 89%
“…As described above, state-of-the-art language models serve as a tool to study meaning as it emerges though the distributional hypothesis paradigm. Ex- isting work on the analysis of Transformers and BERT-based models have explored themes we are interested in, such as semantics (Ethayarajh, 2019), syntax (Goldberg, 2019), morphology (Hofmann et al, 2020(Hofmann et al, , 2021, and the structure of language (Jawahar et al, 2019). However, all of this work has limited itself to the focus of extant words, largely due to the word and sub-word-based nature of these models.…”
Section: Character-level Language Models For Information Analysismentioning
confidence: 99%
“…We see that all of the algorithms have a similar number of prefixes in their vocabularies, which suggests the tokenisation algorithm plays an important role, as performance differences on handling prefixes are large (Table 2) despite similar vocabularies. This is supported by work by Hofmann et al (2021), who find that employing a fixed vocabulary in a morphologically correct way leads to performance improvements. We also see, however, that Unigram has fewer suffixes in its vocabulary than default Unigram, which reflects the performance difference seen in Table 2.…”
Section: Intrinsic Evaluation: Morphological Correctnessmentioning
confidence: 74%
“…An important difference to the rest of this survey is that such an approach has the potential to be stronger even, as foregoing purely concatenative segmentation allows one to "segment" for example the word "hoping" as "hope V.PTCP;PRS" or "ate" as "eat PST," allowing sharing of information with other forms in the respective paradigm. The benefit of such an approach is also shown by Hofmann et al (2021), who observe that undoing derivational processes by splitting words into morphemes before tokenizing can improve sentiment and topicality classification results.…”
Section: Manually Constructed Linguistic Analyzersmentioning
confidence: 93%