A Survey of Data Augmentation Approaches for NLP

Feng, Steven Y.; Gangal, Varun; Lee, Jason; Chandar, Sarath; Vosoughi, Soroush; Mitamura, Teruko; Hovy, Eduard

doi:10.48550/arxiv.2105.03075

Cited by 45 publications

(61 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For the completeness of treatment of the subject, we address those related surveys in the following. Feng et al [14] conducted an extensive survey on data augmentation for NLP robustness. They studied various data augmentation techniques, including rule-based and modelbased techniques as strategies to robustify NLP models to adversarial attacks.…”

Section: Related Workmentioning

confidence: 99%

“…The authors claim that they can certify the classifications of over 50% texts to any perturbation of 5 words on AGNEWS dataset, and 2 words in the SST-2 dataset (dataset-dependent). The interested reader could find more details in the comprehensive survey on data augmentation techniques, their advantages, and disadvantages in [14].…”

Section: A Data Augmentationmentioning

confidence: 99%

“…For instance, one approach to show this practice is by exposing the model to a training set, and not to samples from within the distribution of the testing set. One possible benefit of examining models with such settings is that surface cues can be identified to show limitations of models, as pointed out in a previous section, where gaps could be addressed by using techniques that can help with robustness and generalization, e.g., data augmentation techniques [14].…”

Section: Robustness Via Benchmark Datasetsmentioning

confidence: 99%

“…Broadly speaking, such efforts in the literature are either focused on developing new attacks or better training models to make models resistant to such attacks (i.e., defenses) [13]. To sum up the research efforts dedicated understanding robustness in the literature, there are several research surveys that have addressed specific aspects of NLP robustness, e.g., data augmentation [14], search methods [15], pretrained models [16], and adversarial attacks [17]. However, the literature lacks research studies that provide a systematic overview of the state-of-the-art in this space across a range of variables; applications, technique, metrics, benchmark datasets, threat models, tasks, embedding techniques, learning techniques, goals, defense mechanisms, and performance.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

Omar¹,

Choi²,

Nyang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recent natural language processing (NLP) techniques have accomplished high performance on benchmark datasets, primarily due to the significant improvement in the performance of deep learning. The advances in the research community have led to great enhancements in state-of-the-art production systems for NLP tasks, such as virtual assistants, speech recognition, and sentiment analysis. However, such NLP systems still often fail when tested with adversarial attacks. The initial lack of robustness exposed troubling gaps in current models' language understanding capabilities, creating problems when NLP systems are deployed in real life. In this paper, we present a structured overview of NLP robustness research by summarizing the literature in a systemic way across various dimensions. We then take a deep-dive into the various dimensions of robustness, across techniques, metrics, embeddings, and benchmarks. Finally, we argue that robustness should be multi-dimensional, provide insights into current research, identify gaps in the literature to suggest directions worth pursuing to address these gaps.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: A Data Augmentationmentioning

confidence: 99%

Section: Robustness Via Benchmark Datasetsmentioning

confidence: 99%

mentioning

confidence: 99%

See 2 more Smart Citations

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

Omar¹,

Choi²,

Nyang³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Neural network robustness naturally complements the perspective offered by brittleness as it involves the certification of a model against a wide range of attacks (Huang et al 2017). In NLP, similarly to computer vision (Akhtar and Mian 2018), the majority of works have adopted the narrow notion of robustness, in terms of invariance to minor perturbations of an input text (Gowal et al 2018;Jia et al 2019;Dong et al 2021;La Malfa et al 2020), while only a minority have contested these limitations, either implicitly (Ribeiro et al 2020) or explicitly (Morris 2020;Morris et al 2020a;Xu et al 2020), mainly due to the difficulty of automatically generating semantically involved test beds (Feng et al 2021). Although adversarial data augmentation in NLP is well established (Morris et al 2020b), robustness to semantically coherent, yet possibly diverging, examples is still in its 'adolescence' (Ribeiro, Singh, and Guestrin 2018), as many highly accurate NLP models cannot recognize cogent linguistic phenomena even on low-order tasks such as binary classification (Barnes, Øvrelid, and Velldal 2019).…”

Section: Related Workmentioning

confidence: 99%

The King is Naked: on the Notion of Robustness for Natural Language Processing

Malfa¹,

Kwiatkowska²

2021

Preprint

View full text Add to dashboard Cite

There is growing evidence that the classical notion of adversarial robustness originally introduced for images has been adopted as a de facto standard by a large part of the NLP research community. We show that this notion is problematic in the context of NLP as it considers a narrow spectrum of linguistic phenomena. In this paper, we argue for semantic robustness, which is better aligned with the human concept of linguistic fidelity. We characterize semantic robustness in terms of biases that it is expected to induce in a model. We study semantic robustness of a range of vanilla and robustly trained architectures using a template-based generative test bed. We complement the analysis with empirical evidence that, despite being harder to implement, semantic robustness can improve performance on complex linguistic phenomena where models robust in the classical sense fail.

show abstract

Methodology and Empirical Strategy

Nuccio¹,

Mogno²

2023

Contributions to Management Science

View full text Add to dashboard Cite

A Survey of Data Augmentation Approaches for NLP

Cited by 45 publications

References 0 publications

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

Robust Natural Language Processing: Recent Advances, Challenges, and Future Directions

The King is Naked: on the Notion of Robustness for Natural Language Processing

Methodology and Empirical Strategy

Contact Info

Product

Resources

About