Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP 2018
DOI: 10.18653/v1/w18-5446
|View full text |Cite
|
Sign up to set email alerts
|

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Abstract: Every natural text is written in some style. The style is formed by a complex combination of different stylistic factors, including formality markers, emotions, metaphor, etc. Some factors implicitly reflect the author's personality, while others are explicitly controlled by the author's choices in order to achieve some personal or social goal. One cannot form a complete understanding of a text and its author without considering these factors. The factors combine and co-vary in complex ways to form styles. Stu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

24
3,195
2
5

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 3,020 publications
(3,226 citation statements)
references
References 63 publications
(112 reference statements)
24
3,195
2
5
Order By: Relevance
“…Adversarial Training Figure 1: The workflow of MT-DNN: train a neural language model on a large amount of unlabeled raw text to obtain general contextual representations; then finetune the learned contextual representation on downstream tasks, e.g. GLUE (Wang et al, 2018); lastly, distill this large model to a lighter one for online deployment. In the later two phrases, we can leverage powerful multi-task learning and adversarial training to further improve performance.…”
Section: Single-task Knowledge Distillationmentioning
confidence: 99%
See 1 more Smart Citation
“…Adversarial Training Figure 1: The workflow of MT-DNN: train a neural language model on a large amount of unlabeled raw text to obtain general contextual representations; then finetune the learned contextual representation on downstream tasks, e.g. GLUE (Wang et al, 2018); lastly, distill this large model to a lighter one for online deployment. In the later two phrases, we can leverage powerful multi-task learning and adversarial training to further improve performance.…”
Section: Single-task Knowledge Distillationmentioning
confidence: 99%
“…In this section, we present a comprehensive set of examples to illustrate how to customize MT-DNN for new tasks. We use popular benchmarks from general and biomedical domains, including GLUE (Wang et al, 2018), SNLI (Bowman et al, 2015), SciTail (Khot et al, 2018), SQuAD (Rajpurkar et al, 2016), ANLI (Nie et al, 2019), and biomedical named entity recognition (NER), relation extraction (RE) and question answering (QA) . To make the experiments reproducible, we make all the configuration files publicly available.…”
Section: Applicationmentioning
confidence: 99%
“…However, text, like proteins, is often given in a complete form, presenting a possible bidirectionality that cannot be captured by unidirectional language models. As a response, the models BERT (Devlin et al, 2018) and XLNet (Yang et al, 2019) provide bidirectional language modeling objectives by taking inspiration from denoising autoencoders (Vincent et al, 2008), which currently rank state-of-the-art on popular benchmarks such as GLUE (Wang et al, 2018).…”
Section: Related Workmentioning
confidence: 99%
“…We evaluated BERT+Entity in the natural 1 TagMe's performance on various benchmark datasets ranges from 37% to 72%. F1 (Kolitsas et al, 2018) language understanding benchmark GLUE (Wang et al, 2018), the question answering (QA) benchmarks SQUAD V2 (Rajpurkar et al, 2018) and SWAG (Zellers et al, 2018), and the machine translation benchmark EN-DE WMT14. We confirm the finding from Zhang et al (2019) that additional entity knowledge is not beneficial for the GLUE benchmark.…”
Section: Introductionmentioning
confidence: 99%