2020
DOI: 10.48550/arxiv.2001.02438
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers

Abstract: Transfer learning -transferring learned knowledge -has brought a paradigm shift in the way models are trained. The lucrative benefits of improved accuracy and reduced training time have shown promise in training models with constrained computational resources and fewer training samples. Specifically, publicly available text-based models such as GloVe and BERT that are trained on large corpus of datasets have seen ubiquitous adoption in practice. However, the risks involved in using these public models for vari… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…Moreover, Abdelkader et al [39] found that using a known feature extractor (i.e., pre-trained model) exposes a fine-tuned model to powerful attacks that can be executed without knowledge of the classifier head at all. Recently, Pal and Tople [62] exploited unintended features learnt in the pre-trained model to generate adversarial examples for fine-tuned models, which achieves a high attack success rate in text prediction task domain.…”
Section: A Adversarial Attacks In Transfer Learningmentioning
confidence: 99%
“…Moreover, Abdelkader et al [39] found that using a known feature extractor (i.e., pre-trained model) exposes a fine-tuned model to powerful attacks that can be executed without knowledge of the classifier head at all. Recently, Pal and Tople [62] exploited unintended features learnt in the pre-trained model to generate adversarial examples for fine-tuned models, which achieves a high attack success rate in text prediction task domain.…”
Section: A Adversarial Attacks In Transfer Learningmentioning
confidence: 99%
“…Connection to Adversarial Examples Adversarial examples are minimally edited inputs that cause models to incorrectly change their predictions despite no change in true label (Jia and Liang, 2017;Ebrahimi et al, 2018;Pal and Tople, 2020). Recent methods for generating adversarial examples also preserve fluency Li et al, 2020b;Song et al, 2020) (Iyyer et al, 2018) or word replacement (Alzantot et al, 2018;Ren et al, 2019;Garg and Ramakrishnan, 2020), cannot be used to generate contrastive edits.…”
Section: Counterfactuals Beyond Explanations Concurrent Work Bymentioning
confidence: 99%
“…Adversarial examples are minimally edited inputs that cause models to incorrectly change their predictions (Jia and Liang, 2017;Ebrahimi et al, 2018;Pal and Tople, 2020). While recent work on generating adversarial examples has also focused on preserving semantic coherence and meaning (Ribeiro et al, 2018;Ren et al, 2019;Garg and Ramakrishnan, 2020;Li et al, 2020;Song et al, 2020), the goal of adversarial examples differs from the goal of generating contrastive edits in that adversarial examples are expected not to change true labels such that changes indicate erroneous model behavior, while contrastive edits place no such constraint on the correctness of model output.…”
Section: Adversarial Examplesmentioning
confidence: 99%