2018
DOI: 10.48550/arxiv.1804.07461
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
644
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 561 publications
(745 citation statements)
references
References 28 publications
2
644
0
2
Order By: Relevance
“…The most known representatives of NLP benchmarks are GLUE [24] and Su-perGLUE [23]. The latter is the successor of the former, proposed with more challenging tasks to keep up with pacing progress in the NLP area.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The most known representatives of NLP benchmarks are GLUE [24] and Su-perGLUE [23]. The latter is the successor of the former, proposed with more challenging tasks to keep up with pacing progress in the NLP area.…”
Section: Related Workmentioning
confidence: 99%
“…In Artificial Intelligence (AI) research field, similar tasks often are grouped to a special benchmark containing a set of formalized Machine Learning (ML) problems with defined input data and performance metrics. For example, Ima-geNet [6] benchmark for image classification or General Language Understanding Evaluation (GLUE) benchmark [24]. The comparison of human and ML-model performances allows measuring the progress in a particular field.…”
Section: Introductionmentioning
confidence: 99%
“…PTMs such as GPT (Generative Pretrained Transformers) and BERT [26] (Bidirectional Encoder Representation from Transformer) have recently achieved great success in many complex natural language processing (NLP) tasks and become a milestone in the wider machine learning community. Thanks to the immensity of training data (for BERT, the pre-training corpus contains 3,300 million words) [26]) and the huge number of model parameters (the base version of BERT contains 110 million parameters while the large version of BERT contains 340 million parameters), some of these PTMs have surpassed human performance on multiple language understanding benchmarks [27] [28] [29], such as GLUE [30]. PTMs are now generally used as backbone for downstream tasks, because the rich knowledge stored implicitly in the huge amount of model parameters could be leveraged by fine-tuning them for specific tasks.…”
Section: Deep Learning In Automatic Hateful Message Detectionmentioning
confidence: 99%
“…The baseline provided by Kiela et al [37] including both unimodal PTMs and multimodal PTMs. The unimodal PTMs are BERT [14] (Text BERT), standard ResNet-152 [30] convolutional features from res-5c with average pooling (Image-Grid), and features from fc6 layer that are fine-tuned using weights of the fc7 layer (Image-Region). The multimodal baseline methods include supervised multimodal bitransformers [45] using either Image-Grid or Image-Region features (MMBT-Grid and MMBT-Region), versions of VilBERT [31] and Visual BERT [46] that were only unimodally pretrained and not pretrained on multimodal data (ViLBERT and Visual BERT)¿ The multimodal baselines are ViLBERT trained on Conceptual Captions [47] (ViLBERT CC) and Visual BERT trained on COCO dataset [48] (Visual BERT COCO).…”
Section: Visual-language Ptmmentioning
confidence: 99%
“…The architecture of the transformer [44], allowed to use the concept of attention (and specifically self-attention) very efficiently, and generate new and long sequences effectively and also more coherently. BERT [19] applied a Bidirectional Transformer to Language modeling, and presented state-of-the-art results in a variety of NLP tasks, like GLUE (General Language Understanding Evaluation) [45] task set, SQuAD (Stanford Question Answering Dataset) [36] v1.1 and v2.0 and SWAG (Situations With Adversarial Generations) [48]. In regards to generating novel text, even hard problems like literature (e.g.…”
Section: Introductionmentioning
confidence: 99%