Neural Network Acceptability Judgments

Warstadt, Alex; Singh, Amanpreet; Bowman, Samuel R.

doi:10.48550/arxiv.1805.12471

Cited by 76 publications

(96 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Datasets. We evaluate our method on the GLUE benchmark (Wang et al, 2018), which consists of eight datasets covering various types of tasks including natural language inference (RTE (Dagan et al, 2005), MNLI (Williams et al, 2018), and QNLI (Rajpurkar et al, 2016)), semantic textual similarity (MRPC (Dolan and Brockett, 2005), STS-B (Cer et al, 2017), and QQP 3 ), linguistic acceptability (CoLA (Warstadt et al, 2018)), and sentiment analysis (SST2 (Socher et al, 2013)).…”

Section: Methodsmentioning

confidence: 99%

Sharpness-Aware Minimization with Dynamic Reweighting

Zhou¹,

Liu²,

Chen³

2021

Preprint

View full text Add to dashboard Cite

Deep neural networks are often overparameterized and may not easily achieve model generalization. Adversarial training has shown effectiveness in improving generalization by regularizing the change of loss on top of adversarially chosen perturbations. The recently proposed sharpness-aware minimization (SAM) algorithm adopts adversarial weight perturbation, encouraging the model to converging to a flat minima. Unfortunately, due to increased computational cost, adversarial weight perturbation can only be efficiently approximated per-batch instead of per-instance, leading to degraded performance. In this paper, we propose that dynamically reweighted perturbation within each batch, where unguarded instances are up-weighted, can serve as a better approximation to per-instance perturbation. We propose sharpness-aware minimization with dynamic reweighting (δ-SAM), which realizes the idea with efficient guardedness estimation. Experiments on the GLUE benchmark demonstrate the effectiveness of δ-SAM. 1

show abstract

Section: Methodsmentioning

confidence: 99%

Sharpness-Aware Minimization with Dynamic Reweighting

Zhou¹,

Liu²,

Chen³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…• BERT fine-tuning on CoLA dataset [39]. We use pretrained BERT from Transformers library [40] (bert-base-uncased) and freeze all layers except the last two linear ones.…”

Section: Numerical Experimentsmentioning

confidence: 99%

“…. , k simultaneously where C is defined in (39). Let E k denote the probabilistic event that this statement holds.…”

Section: B11 Two Lemmasmentioning

confidence: 99%

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Gorbunov¹,

Danilova²,

Shibaev³

et al. 2021

Preprint

View full text Add to dashboard Cite

Thanks to their practical efficiency and random nature of the data, stochastic first-order methods are standard for training large-scale machine learning models. Random behavior may cause a particular run of an algorithm to result in a highly suboptimal objective value, whereas theoretical guarantees are usually proved for the expectation of the objective value. Thus, it is essential to theoretically guarantee that algorithms provide small objective residual with high probability. Existing methods for non-smooth stochastic convex optimization have complexity bounds with the dependence on the confidence level that is either negative-power or logarithmic but under an additional assumption of sub-Gaussian (light-tailed) noise distribution that may not hold in practice, e.g., in several NLP tasks. In our paper, we resolve this issue and derive the first high-probability convergence results with logarithmic dependence on the confidence level for non-smooth convex stochastic optimization problems with non-sub-Gaussian (heavy-tailed) noise. To derive our results, we propose novel stepsize rules for two stochastic methods with gradient clipping. Moreover, our analysis works for generalized smooth objectives with Hölder-continuous gradients, and for both methods, we provide an extension for strongly convex problems. Finally, our results imply that the first (accelerated) method we consider also has optimal iteration and oracle complexity in all the regimes, and the second one is optimal in the non-smooth setting.

show abstract

“…Therefore, in this module, we prune the candidate list and retain only the grammatical ones. Toward this, we train a grammaticality classifier on the corpus of linguistic acceptability (CoLA) (Warstadt et al, 2018), a dataset with 10,657 English sentences labeled as grammatical or ungrammatical from linguistics publications. We select BERT (Devlin et al, 2019) as the classification model, and fine-tune it on the CoLA dataset.…”

Section: Candidate Pruningmentioning

confidence: 99%

Generate, Prune, Select: A Pipeline for Counterspeech Generation against Online Hate Speech

Zhu

Bhat

2021

Preprint

View full text Add to dashboard Cite

Warning: this paper contains content that may be offensive or upsetting.Countermeasures to effectively fight the ever increasing hate speech online without blocking freedom of speech is of great social interest. Natural Language Generation (NLG), is uniquely capable of developing scalable solutions. However, off-the-shelf NLG methods are primarily sequence-to-sequence neural models and they are limited in that they generate commonplace, repetitive and safe responses regardless of the hate speech (e.g., "Please refrain from using such language.") or irrelevant responses, making them ineffective for de-escalating hateful conversations. In this paper, we design a three-module pipeline approach to effectively improve the diversity and relevance. Our proposed pipeline first generates various counterspeech candidates by a generative model to promote diversity, then filters the ungrammatical ones using a BERT model, and finally selects the most relevant counterspeech response using a novel retrievalbased method. Extensive Experiments on three representative datasets demonstrate the efficacy of our approach in generating diverse and relevant counterspeech.

show abstract

Neural Network Acceptability Judgments

Cited by 76 publications

References 0 publications

Sharpness-Aware Minimization with Dynamic Reweighting

Sharpness-Aware Minimization with Dynamic Reweighting

Near-Optimal High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise

Generate, Prune, Select: A Pipeline for Counterspeech Generation against Online Hate Speech

Contact Info

Product

Resources

About