Madhura Pande scite author profile

Madhura Pande

5Publications

18Citation Statements Received

111Citation Statements Given

How they've been cited

How they cite others

101

100

Affiliations

University of Colorado Anschutz Medical Campus, Indian Institute of Technology Madras, University of Mysore

Publications

Order By: Most citations

On the weak link between importance and prunability of attention heads

Budhraja

Pande

Nema

et al. 2020

View full text Add to dashboard Cite

Given the success of Transformer-based models, two directions of study have emerged: interpreting role of individual attention heads and down-sizing the models for efficiency. Our work straddles these two streams: We analyse the importance of basing pruning strategies on the interpreted role of the attention heads. We evaluate this on Transformer and BERT models on multiple NLP tasks. Firstly, we find that a large fraction of the attention heads can be randomly pruned with limited effect on accuracy. Secondly, for Transformers, we find no advantage in pruning attention heads identified to be important based on existing studies that relate importance to the location of a head. On the BERT model too we find no preference for top or bottom layers, though the latter are reported to have higher importance. However, strategies that avoid pruning middle layers and consecutive layers perform better. Finally, during fine-tuning the compensation for pruned attention heads is roughly equally distributed across the un-pruned heads. Our results thus suggest that interpretation of attention heads does not strongly inform pruning.

show abstract

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT

Pande¹,

Budhraja²,

Nema³

et al. 2021

Preprint

View full text Add to dashboard Cite

Multi-headed attention heads are a mainstay in transformerbased models. Different methods have been proposed to classify the role of each attention head based on the relations between tokens which have high pair-wise attention. These roles include syntactic (tokens with some syntactic relation), local (nearby tokens), block (tokens in the same sentence) and delimiter (the special [CLS], [SEP] tokens). There are two main challenges with existing methods for classification: (a) there are no standard scores across studies or across functional roles, and (b) these scores are often average quantities measured across sentences without capturing statistical significance. In this work, we formalize a simple yet effective score that generalizes to all the roles of attention heads and employs hypothesis testing on this score for robust inference. This provides us the right lens to systematically analyze attention heads and confidently comment on many commonly posed questions on analyzing the BERT model. In particular, we comment on the co-location of multiple functional roles in the same attention head, the distribution of attention heads across layers, and effect of fine-tuning for specific NLP tasks on these functional roles. Code is made publicly available. 1

show abstract

Reducing disparities in kidney transplantation for Spanish-speaking patients through creation of a dedicated center

et al. 2022

View full text Add to dashboard Cite

Introduction Hispanic Americans receive disproportionately fewer organ transplants than non-Hispanic whites. In 2018, the Hispanic Kidney Transplant Program (HKTP) was established as at the University of Colorado Hospital (UCH). The purpose of this quality improvement study was to examine the effect of this culturally sensitive program in reducing disparities in kidney transplantation. Methods We performed a mixed-methods analysis of data from 436 Spanish-speaking patients referred for transplant to UCH between 2015 and 2020. We compared outcomes for patients referred between 2015–2017 (n = 156) to those referred between 2018–2020 (n = 280). Semi-structured phone interviews were conducted with 6 patients per time period and with 6 nephrology providers in the Denver Metro Area. Patients and providers were asked to evaluate communication, transplant education, and overall experience. Results When comparing the two time periods, there was a significant increase in the percentage of patients being referred (79.5% increase, p-0.008) and evaluated for transplant (82.4% increase, p = 0.02) during 2018–2020. While the number of committee reviews and number waitlisted increased during 2018–2020, it did not reach statistical significance (82.9% increase, p = 0.37 and 79.5% increase, p = 0.75, respectively. During patient and provider interviews, we identified 4 themes reflecting participation in the HKTP: improved communication, enhanced patient education, improved experience and areas for advancement. Overall, patients and providers reported a positive experience with the HKTP and noted improved patient understanding of the transplantation process. Conclusions The establishment of the HKTP is associated with a significant increase in Spanish-speaking Hispanic patients being referred and evaluated for kidney transplantation.

show abstract

On the Prunability of Attention Heads in Multilingual BERT

Budhraja¹,

Pande²,

Kumar³

et al. 2021

Preprint

View full text Add to dashboard Cite

Large multilingual models, such as mBERT, have shown promise in crosslingual transfer. In this work, we employ pruning to quantify the robustness and interpret layer-wise importance of mBERT. On four GLUE tasks, the relative drops in accuracy due to pruning have almost identical results on mBERT and BERT suggesting that the reduced attention capacity of the multilingual models does not affect robustness to pruning. For the crosslingual task XNLI, we report higher drops in accuracy with pruning indicating lower robustness in crosslingual transfer. Also, the importance of the encoder layers sensitively depends on the language family and the pre-training corpus size. The top layers, which are relatively more influenced by fine-tuning, encode important information for languages similar to English (SVO) while the bottom layers, which are relatively less influenced by fine-tuning, are particularly important for agglutinative and low-resource languages.

show abstract

The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT

Pande

Budhraja

Nema

et al. 2021

AAAI

View full text Add to dashboard Cite

Multi-headed attention heads are a mainstay in transformer-based models. Different methods have been proposed to classify the role of each attention head based on the relations between tokens which have high pair-wise attention. These roles include syntactic (tokens with some syntactic relation), local (nearby tokens), block (tokens in the same sentence) and delimiter (the special [CLS], [SEP] tokens). There are two main challenges with existing methods for classification: (a) there are no standard scores across studies or across functional roles, and (b) these scores are often average quantities measured across sentences without capturing statistical significance. In this work, we formalize a simple yet effective score that generalizes to all the roles of attention heads and employs hypothesis testing on this score for robust inference. This provides us the right lens to systematically analyze attention heads and confidently comment on many commonly posed questions on analyzing the BERT model. In particular, we comment on the co-location of multiple functional roles in the same attention head, the distribution of attention heads across layers, and effect of fine-tuning for specific NLP tasks on these functional roles. The code is made publicly available at https://github.com/iitmnlp/heads-hypothesis

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Madhura Pande

On the weak link between importance and prunability of attention heads

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT

Reducing disparities in kidney transplantation for Spanish-speaking patients through creation of a dedicated center

On the Prunability of Attention Heads in Multilingual BERT

The Heads Hypothesis: A Unifying Statistical Approach Towards Understanding Multi-Headed Attention in BERT

Contact Info

Product

Resources

About