On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Sun, Han‐Dong; Xu, Guangxuan; Deng, Jiawen; Cheng, Jiale; Zheng, Chujie; Zhang, Hao; Peng, Nanyun; Zhu, Xiaoyan; Huang, Minlie

doi:10.48550/arxiv.2110.08466

Cited by 6 publications

(15 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For further clarifying what safety problems cover, [1496] proposes a classification of safety issues in open-domain conversational systems including three general categories and emphasizes the importance of context. More elaborately, [1497] recently proposes a more fine-grained safety issue taxonomy that divides personal and non-personal unsafe behaviors in dialogues and defines 7 sub-categories of unsafe responses. In summary, there are some safety issues of the dialogue system as follows.…”

Section: Safety and Ethical Riskmentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

domains indexed by Google News. It contains 31 million documents with an average length of 793 BPE tokens. Like C4, it excludes examples with duplicate URLs. News dumps from December 2016 through March 2019 were used as training data, articles published in April 2019 from the April 2019 dump were used for evaluation. OpenWebText2(OWT2). OWT2 is an enhanced version of the original OpenWebTextCorpus, including content from multiple languages, document metadata, multiple dataset versions, and open source replication code, covering all Reddit submissions from 2005 up until April 2020. PubMed Central(PMC). PMC is a free full-text archive of biomedical and life sciences journal literature from the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). The dataset is updated daily. In addition to full-text articles, they contain corrections, retractions, and expressions of concern, as well as file lists that include metadata for articles in each dataset.PMC obtained by open registration in Amazon Web Services (AWS) includes The PMC Open Access Subset and The Author Manuscript Dataset. The PMC Open Access Subset includes all articles and preprints in PMC with a machine-readable Creative Commons license that allows reuse. The Author Manuscript Dataset includes accepted author manuscripts collected under a funder policy in PMC and made available in machine-readable formats for text mining. ArXiv. ArXiv is a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more. It provides open access to academic articles, covering many subdisciplines from vast branches of physics to computer science to everything in between, including math, statistics, electrical engineering, quantitative biology, and economics, which is helpful to the potential downstream applications of the research field. In addition, the writing language of LaTeX also contributes to the study of language models. Colossal Clean Crawled Corpus(C4). C4 is a colossal, cleaned version of Common Crawl's web crawl corpus. It is based on Common Crawl dataset and was used to train the T5 text-to-text Transformer models. The cleaned English version of C4 has 364,868,901 training examples and 364,608 validation examples, while the uncleaned English version has 1,063,805,324 training examples and 1,065,029 validation examples; the realnewslike version has 13,799,838 training examples and 13,863 validation examples, while the webtextlike version has 4,500,788 training examples and 4,493 validation examples. Wiki-40B. Wikipedia (Wiki-40B) is a clean-up text collection containing more than 40 Wikipedia language editions of pages corresponding to entities. The dataset is split into train/validation/test sets for each language. The training set has 2,926,536 examples, the validation set has 163,597 examples, and the test set has 162,274 examples. Wiki-40B is cleaned by a page filter to remove ambiguous, redirected, deleted, and non-physical pages. CLUECorpus2020. CLUECorpus2020 ...

show abstract

Section: Safety and Ethical Riskmentioning

confidence: 99%

A Roadmap for Big Model

Yuan¹,

Zhao²,

Jiahong³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Inheriting from pre-trained language models, dialog safety issues, including toxicity and offensiveness (Baheti et al, 2021;Cercas Curry and Rieser, 2018;, bias (Henderson et al, 2018;Barikeri et al, 2021;Lee et al, 2019), privacy (Weidinger et al, 2021), and sensitive topics Sun et al, 2021), are exceeding studied and increasingly drawing attention. In the conversational unsafety measurement (Cercas Curry and Rieser, 2018;Sun et al, 2021;Edwards et al, 2021), adversarial learning for safer bots Gehman et al, 2020) and bias mitigation Thoppilan et al, 2022) strategies, unsafety behaviour detecting task plays an important role.…”

Section: Dialog Safety and Social Biasmentioning

confidence: 99%

“…However, neural open-domain conversational agents trained on large-scale unlabeled data may pick up many unsafe features in the corpora, e.g., offensive languages, social biases, violet, etc Barikeri et al, 2021;Weidinger et al, 2021;Sun et al, 2021). Unlike other unsafety problems, social biases that convey negative stereotypes or prejudices on specific populations are usually stated in implicit expressions rather than explicit words Blodgett et al, 2020), thus is a challenging task to deal with.…”

Section: Introductionmentioning

confidence: 99%

“…However, most existing bias-detection studies focus on the token or utterance level (Nadeem et al, 2020;Smith et al, 2022). Thus they can hardly be directly transferred to dialog circumstances where biased responses are often beyond utterance level and are very sensitive to the dialog context (Sun et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

“…Due to the inherently subjective and subtle nature of the biases identification, it is hard to judge whether a statement contains bias directly, even for humans (Sap et al, 2019. Rather than formulating the social bias measurement as a toxicity classification task (Sun et al, 2021;Founta et al, 2018), detailed analysis and normative reasoning process (we name it "Frame" in this paper) can reduce the risk of introducing or amplifying bias during annotation and model development (Sap et al, 2019;Davidson et al, 2019). Such a frame also improves the understanding of why a data entry is biased (Ribeiro et al, 2016), thus boosting the ability of bias detectors Blodgett et al, 2020).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

Zhou¹,

Deng²,

Mi³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Warning: this paper contains content that may be offensive or upsetting.The research of open-domain dialog systems has been greatly prospered by neural models trained on large-scale corpora, however, such corpora often introduce various safety problems (e.g., offensive languages, biases, and toxic behaviors) that significantly hinder the deployment of dialog systems in practice. Among all these unsafe issues, addressing social bias is more complex as its negative impact on marginalized populations is usually expressed implicitly, thus requiring normative reasoning and rigorous analysis. In this paper, we focus our investigation on social bias detection of dialog safety problems. We first propose a novel DIAL-BIAS FRAME for analyzing the social bias in conversations pragmatically, which considers more comprehensive bias-related analyses rather than simple dichotomy annotations. Based on the proposed framework, we further introduce CDAIL-BIAS DATASET that, to our knowledge, is the first well-annotated Chinese social bias dialog dataset. In addition, we establish several dialog bias detection benchmarks at different label granularities and input types (utterance-level and context-level). We show that the proposed in-depth analyses together with these benchmarks in our DIAL-BIAS FRAME are necessary and essential to bias detection tasks and can benefit building safe dialog systems in practice.

show abstract

Bias Detection for Customer Interaction Data: A Survey on Datasets, Methods, and Tools

Donald

Galanopoulos²,

Curry

et al. 2023

IEEE Access

View full text Add to dashboard Cite

With the increase in usage of machine learning models within many different aspects of customer interactions, it has become very clear that bias detection within associated customer interaction datasets has led to a critical focus on issues such as the identification of bias prior to model building, lack of understanding and transparency within models, and ultimately the prevention of biased predictions or classifications. This has never been more important since the introduction of the EU General Data Protection Regulation (GDPR) and the associated rule of ''right of explanation''. In this paper, we survey the state of the art for bias detection, avoidance and mitigation within datasets, and the associated methods and tools available. Our purpose is to establish an understanding of how established customer interaction-based use cases can utilise these techniques. The focus is primarily on tackling the bias in unstructured text data as a pre-process prior to the machine learning model training phase. We hope that this research encourages the further establishment of responsible usage of customer interaction datasets to allow the prevention of bias being introduced into machine learning pipelines and to also allow greater awareness of the potential for further research in this area.

show abstract

On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark

Cited by 6 publications

References 0 publications

A Roadmap for Big Model

A Roadmap for Big Model

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks

Bias Detection for Customer Interaction Data: A Survey on Datasets, Methods, and Tools

Contact Info

Product

Resources

About