Generate Your Counterfactuals: Towards Controlled Counterfactual Generation for Text

Madaan, Nishtha; Padhi, Inkit; Panwar, Naveen; Saha, Diptikalyan

doi:10.48550/arxiv.2012.04698

Cited by 9 publications

(12 citation statements)

References 27 publications

(16 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Besides the aforementioned paraphrasing and style transfer, prior works have also successfully generated contrastive examples that are useful for model training, evaluation, and explanation. They usually rely on application-specific class labels (Ross et al, 2020;Madaan et al, 2020b;Sha et al, 2021;Akyürek et al, 2020) or heuristic perturbation strategies that needs to be expressed through pairs of original and perturbed sentences (Wu et al, 2021), which are expensive to generalize. Recently, Huang and Chang (2021) designed SynPG, a paraphraser that can mimic parse tree structures learned from non-paired sentences.…”

Section: Related Workmentioning

confidence: 99%

“…Controllable text generation through semantic perturbations, which modifies sentences to match certain target attributes, has been widely applied to a variety of tasks, e.g., changing sentence styles (Reid and Zhong, 2021), mitigating dataset biases (Gardner et al, 2021), explaining model behaviors (Ross et al, 2020), and improving model generalization (Teney et al, 2020;Wu et al, 2021). Existing work trains controlled generators with task-specific data, e.g., training a style transferer requires instances labeled with positive and negative sentiments (Madaan et al, 2020b). As a result, * denotes equal contribution.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Tailor: Generating and Perturbing Text with Semantic Controls

Ross¹,

Wu²,

Peng³

et al. 2021

Preprint

View full text Add to dashboard Cite

Making controlled perturbations is essential for various tasks (e.g., data augmentation), but building task-specific generators can be expensive. We introduce Tailor, a taskagnostic generation system that perturbs text in a semantically-controlled way. With unlikelihood training, we design Tailor's generator to follow a series of control codes derived from semantic roles. Through modifications of these control codes, Tailor can produce fine-grained perturbations. We implement a set of operations on control codes that can be composed into complex perturbation strategies, and demonstrate their effectiveness in three distinct applications: First, Tailor facilitates the construction of high-quality contrast sets that are lexically diverse, and less biased than original task test data. Second, paired with automated labeling heuristics, Tailor helps improve model generalization through data augmentation: We obtain an average gain of 1.73 on an NLI challenge set by perturbing just ≈5% of training data. Third, without any finetuning overhead, Tailor's perturbations effectively improve compositionality in fine-grained style transfer, outperforming fine-tuned baselines on 6 transfers.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Tailor: Generating and Perturbing Text with Semantic Controls

Ross¹,

Wu²,

Peng³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…where an explicit protected attribute is often not present. In these domains, counterfactual augmentation is generally a manual process; recent work provides support [31,42] but not complete automation. Prediction sensitivity (defined in Section 3) can be viewed as a way of measuring counterfactual fairness.…”

Section: Background and Related Workmentioning

confidence: 99%

Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

Maughan¹,

Ngong²,

Near³

2022

Preprint

View full text Add to dashboard Cite

As AI-based systems increasingly impact many areas of our lives, auditing these systems for fairness is an increasingly high-stakes problem. Traditional group fairness metrics can miss discrimination against individuals and are difficult to apply after deployment.Counterfactual fairness describes an individualized notion of fairness, but is even more challenging to evaluate after deployment. We present prediction sensitivity, an approach for continual audit of counterfactual fairness in deployed classifiers. Prediction sensitivity helps answer the question: would this prediction have been different, if this individual had belonged to a different demographic group-for every prediction made by the deployed model. Prediction sensitivity can leverage correlations between protected status and other features, and does not require protected status information at prediction time. Our empirical results demonstrate that prediction sensitivity is effective for detecting violations of counterfactual fairness.

show abstract

“…However, since they are trained using negative samples obtained from random contexts, they are also prone to the spurious pattern of content similarity. Adversarial or counterfactual data creation techniques have been proposed for applications such as evaluation (Gardner et al, 2020;Madaan et al, 2020), attacks (Ebrahimi et al, 2018;Wallace et al, 2019;Jin et al, 2020), explanations (Goodwin et al, 2020;Ross et al, 2020) or training models to be robust against spurious patterns and biases (Garg et al, 2019;Huang et al, 2020). Adversarial examples are crafted through operations such as adding noisy characters (Ebrahimi et al, 2018;Pruthi et al, 2019), paraphrasing (Iyyer et al, 2018), replacing with synonyms (Alzantot et al, 2018;Jin et al, 2020), rule based token-level transformations (Kryscinski et al, 2020), or inserting words relevant to the context (Zhang et al, 2019).…”

Section: Related Workmentioning

confidence: 99%

Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation

Gupta¹,

Tsvetkov²,

Bigham³

2021

Preprint

View full text Add to dashboard Cite

Open-domain neural dialogue models have achieved high performance in response ranking and evaluation tasks. These tasks are formulated as a binary classification of responses given in a dialogue context, and models generally learn to make predictions based on context-response content similarity. However, over-reliance on content similarity makes the models less sensitive to the presence of inconsistencies, incorrect time expressions and other factors important for response appropriateness and coherence. We propose approaches for automatically creating adversarial negative training data to help ranking and evaluation models learn features beyond content similarity. We propose mask-and-fill and keyword-guided approaches that generate negative examples for training more robust dialogue systems. These generated adversarial responses have high content similarity with the contexts but are either incoherent, inappropriate or not fluent. Our approaches are fully data-driven and can be easily incorporated in existing models and datasets. Experiments on classification, ranking and evaluation tasks across multiple datasets demonstrate that our approaches outperform strong baselines in providing informative negative examples for training dialogue systems. 1

show abstract

Generate Your Counterfactuals: Towards Controlled Counterfactual Generation for Text

Cited by 9 publications

References 27 publications

Tailor: Generating and Perturbing Text with Semantic Controls

Tailor: Generating and Perturbing Text with Semantic Controls

Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers

Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation

Contact Info

Product

Resources

About