Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP 2020
DOI: 10.18653/v1/2020.blackboxnlp-1.31
|View full text |Cite
|
Sign up to set email alerts
|

This is a BERT. Now there are several of them. Can they generalize to novel words?

Abstract: Recently, large-scale pre-trained neural network models such as BERT have achieved many state-of-the-art results in natural language processing. Recent work has explored the linguistic capacities of these models. However, no work has focused on the ability of these models to generalize these capacities to novel words. This type of generalization is exhibited by humans (Berko, 1958), and is intimately related to morphology-humans are in many cases able to identify inflections of novel words in the appropriate c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 19 publications
0
5
0
Order By: Relevance
“…While BERT was created for tasks with an English corpus, mBERT was created using all Wikipedia pages with 104 different languages, making it able to handle tasks in multiple languages [ 8 ]. However, this resulted in worse or comparable performance to monolingual models with BERT’s architecture settings [ 14 ]. It has been speculated that this is because Wikipedia is not representative of general language use [ 15 ].…”
Section: Related Workmentioning
confidence: 99%
“…While BERT was created for tasks with an English corpus, mBERT was created using all Wikipedia pages with 104 different languages, making it able to handle tasks in multiple languages [ 8 ]. However, this resulted in worse or comparable performance to monolingual models with BERT’s architecture settings [ 14 ]. It has been speculated that this is because Wikipedia is not representative of general language use [ 15 ].…”
Section: Related Workmentioning
confidence: 99%
“…Multilingual models have also demonstrated that synthetic transfer can occur between languages (Guarasci et al, 2022). Meanwhile, Haley (2020) show that BERT can perform the Wug test, a standard grammatical generalisation test (Berko Gleason, 1958), significantly better than chance in English, French, Spanish and Dutch. However, there are still many gaps in this research.…”
Section: Grammatical Embedding and Indeclinable Nounsmentioning
confidence: 99%
“…Pre-trained language models, such as BERT or GPT-3, are either fine-tuned on a new task or are primed on prompts either crafted by humans [Haley, 2020, Misra et al, 2020, Petroni et al, 2019 or found automatically [Shin et al, 2020, Jiang et al, 2020. These prompts (also called stimuli, constraints, or demonstrations) are then dependent only on the specific task at hand and nothing else.…”
Section: Specificitymentioning
confidence: 99%