2020
DOI: 10.48550/arxiv.2002.06261
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Stress Test Evaluation of Transformer-based Models in Natural Language Understanding Tasks

Abstract: There has been significant progress in recent years in the field of Natural Language Processing thanks to the introduction of the Transformer architecture. Current state-of-the-art models, via a large number of parameters and pre-training on massive text corpus, have shown impressive results on several downstream tasks. Many researchers have studied previous (non-Transformer) models to understand their actual behavior under different scenarios, showing that these models are taking advantage of clues or failure… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
1

Relationship

2
2

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 28 publications
0
4
0
Order By: Relevance
“…Differently to TriRank, our method combines ratings and aspect-based opinions in a KG to learn richer vector representations for both users and items, with which generating personalized recommendations. Furthermore, there have been approaches to automatize text content analysis in the field of medicine [5], along with stress tests [1,2] to evaluate the models performance under different situations.…”
Section: Related Workmentioning
confidence: 99%
“…Differently to TriRank, our method combines ratings and aspect-based opinions in a KG to learn richer vector representations for both users and items, with which generating personalized recommendations. Furthermore, there have been approaches to automatize text content analysis in the field of medicine [5], along with stress tests [1,2] to evaluate the models performance under different situations.…”
Section: Related Workmentioning
confidence: 99%
“…Belinkov and Bisk (2017) show that character basedneural machine translation (NMT) models are also prone to synthetic and natural noise even though these model dobetter job to handle out-of-vocabulary issues and learn better morphological representation. Aspillaga et al (2020) evaluated RoBERTa, XL-Net, and BERT in Natural Language Inference (NLI) and Question Answering (QA) tasks. They used BiDAF (Seo et al, 2016) and Match-LSTM (Wang and Jiang, 2016) as baselines to compare stress tests against Transformer-based models.…”
Section: Previous Workmentioning
confidence: 99%
“…Moreover, using the whole text can often contain redundant and noisy information that can worsen the performance of the model [1], compared to extracting relevant information in the form of aspects, as proposed in this work. Furthermore, there have been approaches to automatize text content analysis in the field of medicine [7], along with stress tests [2,3] to evaluate the models performance under different situations.…”
Section: Review and Text-based Recommendationsmentioning
confidence: 99%