Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Baheti, Ashutosh; Sap, Maarten; Ritter, Alan; Riedl, Mark O.

doi:10.48550/arxiv.2108.11830

Cited by 6 publications

(6 citation statements)

References 27 publications

(36 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most methods have been tested on architectures tailored for the English language (Colombo et al 2022;Arora, Huang, and He 2021). With inclusivity and diversity in mind (Ruder 2022;van Esch et al 2022), it is necessary to assess the performance of old and new OOD detection methods on a variety of languages (Srinivasan et al 2021;de Vries, van Cranenburgh, and Nissim 2020;Baheti et al 2021;Zhang et al 2022).…”

Section: Limitation Of Existing Benchmarksmentioning

confidence: 99%

Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection

Darrin,

Staerman,

Dadalto Camara Gomes

et al. 2024

AAAI

View full text Add to dashboard Cite

Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on anomaly scores (\textit{e.g.}, Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we observe that OOD detection performance varies greatly depending on the task and layer output. More importantly, we show that the usual choice (the last layer) is rarely the best one for OOD detection and that far better results can be achieved, provided that an oracle selects the best layer. We propose a data-driven, unsupervised method to leverage this observation to combine layer-wise anomaly scores. In addition, we extend classical textual OOD benchmarks by including classification tasks with a more significant number of classes (up to 150), which reflects more realistic settings. On this augmented benchmark, we show that the proposed post-aggregation methods achieve robust and consistent results comparable to using the best layer according to an oracle while removing manual feature selection altogether.

show abstract

Section: Limitation Of Existing Benchmarksmentioning

confidence: 99%

Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection

Darrin,

Staerman,

Dadalto Camara Gomes

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…Most methods have been tested on architectures tailored for the English language (Colombo et al, 2022a;Li et al, 2021;Arora et al, 2021). With inclusivity and diversity in mind (Ruder, 2022;van Esch et al, 2022), it is necessary to assess the performance of old and new OOD detection methods on a variety of languages (Srinivasan et al, 2021;de Vries et al, 2020;Baheti et al, 2021;Zhang et al, 2022).…”

Section: Limitation Of Existing Benchmarksmentioning

confidence: 99%

Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

Darrin¹,

Staerman²,

Câmara³

et al. 2023

Preprint

View full text Add to dashboard Cite

Out-of-distribution (OOD) detection is a rapidly growing field due to new robustness and security requirements driven by an increased number of AI-based systems. Existing OOD textual detectors often rely on anomaly scores (e.g., Mahalanobis distance) computed on the embedding output of the last layer of the encoder. In this work, we observe that OOD detection performance varies greatly depending on the task and layer output. More importantly, we show that the usual choice (the last layer) is rarely the best one for OOD detection and that far better results can be achieved provided that an oracle selects the best layer. To leverage this observation, we propose a data-driven, unsupervised method to combine layer-wise anomaly scores. In addition, we extend classical textual OOD benchmarks by including classification tasks with a greater number of classes (up to 77), which reflects more realistic settings. On this augmented benchmark, we show that the proposed post-aggregation methods achieve robust and consistent results comparable to using the best layer according to an oracle while removing manual feature selection altogether.

show abstract

“…Safety assessments are also conducted by constructing contexts based on templates or collected datasets. For example, some past work find that conversational models tend to become more unsafe faced with specific contexts like toxic or biased languages [1511], harassment [1512], and political topics [1513], etc. Also, inspired by LAMA [92], some recent works probe the safety of language models using intra-sentence (cloze) test [1116,1514,704,1515].…”

Section: Safety and Ethical Riskmentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

View full text Add to dashboard Cite

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts

Cited by 6 publications

References 27 publications

Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection

Unsupervised Layer-Wise Score Aggregation for Textual OOD Detection

Unsupervised Layer-wise Score Aggregation for Textual OOD Detection

A Roadmap for Big Model

Contact Info

Product

Resources

About