Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research W 2021
DOI: 10.18653/v1/2021.eacl-srw.2
|View full text |Cite
|
Sign up to set email alerts
|

Have Attention Heads in BERT Learned Constituency Grammar?

Abstract: With the success of pre-trained language models in recent years, more and more researchers focus on opening the "black box" of these models. Following this interest, we carry out a qualitative and quantitative analysis of constituency grammar in attention heads of BERT and RoBERTa. We employ the syntactic distance method to extract implicit constituency grammar from the attention weights of each head. Our results show that there exist heads that can induce some grammar types much better than baselines, suggest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
0
1
0
Order By: Relevance
“…A scope of approaches has been proposed to interpret the roles of hundreds of attention heads in encoding linguistic properties (Htut et al, 2019;Wu et al, 2020) and identify how the most influential ones benefit the downstream performance (Voita et al, 2019;Jo and Myaeng, 2020). Prior work has demonstrated that heads induce grammar formalisms and structural knowledge (Zhou and Zhao, 2019;Luo, 2021), and linguistic features motivate attention patterns (Kovaleva et al, 2019;Clark et al, 2019). Recent studies also show that certain heads can have multiple functional roles (Pande et al, 2021) and even perform syntactic functions for typologically distant languages (Ravishankar et al, 2021).…”
Section: Introductionmentioning
confidence: 99%
“…A scope of approaches has been proposed to interpret the roles of hundreds of attention heads in encoding linguistic properties (Htut et al, 2019;Wu et al, 2020) and identify how the most influential ones benefit the downstream performance (Voita et al, 2019;Jo and Myaeng, 2020). Prior work has demonstrated that heads induce grammar formalisms and structural knowledge (Zhou and Zhao, 2019;Luo, 2021), and linguistic features motivate attention patterns (Kovaleva et al, 2019;Clark et al, 2019). Recent studies also show that certain heads can have multiple functional roles (Pande et al, 2021) and even perform syntactic functions for typologically distant languages (Ravishankar et al, 2021).…”
Section: Introductionmentioning
confidence: 99%