“…We address these limitations of human red teaming with a "plug-and-play" AI-assisted Red Teaming (AART) pipeline for generating adversarial testing datasets at scale by minimizing the human effort to only guide the adversarial generation recipe. Our work builds on recent automated red teaming (Perez et al, 2022), synthetic safety data generation (Fryer et al, 2022;Hartvigsen et al, 2022;Bai et al, 2022;Sun et al, 2023) and human-in-theloop methods . We adapt work on self-consistency (Wang et al, 2023a), chain-ofthought (Kojima et al, 2023Wei et al, 2022), and structured reasoning and data generation (Wang et al, 2023b;Xu et al, 2023;Creswell and Shanahan, 2022) and creatively apply them to the task of adversarial dataset creation.…”