When Is It Acceptable to Break the Rules? Knowledge Representation of Moral Judgement Based on Empirical Data

Awad, Edmond; Levine, Sydney; Loreggia, Andrea; Mattei, Nicholas; Rahwan, Iyad; Rossi, Francesca; Talamadupula, Kartik; Tenenbaum, Joshua B.; Kleiman-Weiner, Max

doi:10.48550/arxiv.2201.07763

Cited by 4 publications

(6 citation statements)

References 33 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…That is, the moral sense gives us not only the duty to behave cooperatively, but also the duty to respect and enforce good "rules of the game" that incentivize efficient and fair behaviors. In line with this idea, empirical observations show that humans are indeed capable of assessing, on the fly, what is moral and immoral, even when their actions are rules whose effects are indirect, through incentives on other behaviors (Awad et al, 2022;Levine et al, 2018Levine et al, , 2020. We will call "moral rules of the game", or sometimes more simply "moral rules", these second-order contracts.…”

Section: Moral Contracts May Reach Their Social Value Indirectly and ...mentioning

confidence: 93%

“…Humans, including young children, are able to evaluate the costs and benefits of people's behaviors in various contexts, and this allows them both to predict others' behavior (Jara-Ettinger et al, 2015, and to infer others' costs and benefits functions from their actions, based on the assumption that they will seek to maximize their utility (Baker et al, 2009;Liu et al, 2017;Sosa et al, 2021). Furthermore, people make use of this ability to calculate, in context, what is right or wrong, including in cases they have never seen before (Awad et al, 2022;Carlson et al, 2022;Levine et al, 2020), and thereby to evaluate morally the behavior of others (Berman and Silver, 2022;Bigman and Tamir, 2016;Gerstenberg et al, 2018;Jara-ettinger et al, 2014;Kodipady et al;Kraft-Todd et al, 2021;Sosa et al, 2021), and manage their own reputation (Kleiman-weiner et al, 2017). Humans are thus demonstrably equipped with cognitive mechanisms to make sophisticated context-dependent assessments of costs and benefits, and use them in everyday life to assess the moral value of their actions and those of others.…”

Section: Morality As the Cognitive Organ Of Reciprocitymentioning

confidence: 99%

“…As a result, moral duties only appear to be principled but are never really so. That is, there will always be situations where direct costs are so high that it becomes moral to violate any rule (Awad et al, 2022).…”

Section: Moral Contracts May Reach Their Social Value Indirectly and ...mentioning

confidence: 99%

See 2 more Smart Citations

An evolutionary contractualist theory of morality

André¹,

Debove²,

Fitouchi³

et al. 2022

Preprint

View full text Add to dashboard Cite

Our goal in this paper is to use an evolutionary approach to explain the existence and design-features of human moral cognition. Our approach is based on the premise that human beings are under selection to appear as good cooperative investments. Hence they face a trade-off between maximizing the immediate gains of each social interaction, and maximizing its long-term reputational effects. In a simple 2-player model, we show that this trade-off leads individuals to maximize the generalized Nash product at evolutionary equilibrium, i.e., to behave according to the generalized Nash bargaining solution. We infer from this result the theoretical proposition that morality is a domain-general calculator of this bargaining solution. We then proceed to describe the generic consequences of this approach: (i) everyone in a social interaction deserves to receive a net benefit, (ii) people ought to act in ways that would maximize social welfare if everyone was acting in the same way, (iii) all domains of social behavior can be moralized, (iv) moral duties can seem both principled and non-contractual, and (v) morality shall depend on the context. Next, we apply the approach to some of the main areas of social life and show that it allows to explain, with a single logic, the entire set of what are generally considered to be different moral domains. Lastly, we discuss the relationship between this account of morality and other evolutionary accounts of morality and cooperation.

show abstract

Section: Moral Contracts May Reach Their Social Value Indirectly and ...mentioning

confidence: 93%

Section: Morality As the Cognitive Organ Of Reciprocitymentioning

confidence: 99%

See 1 more Smart Citation

An evolutionary contractualist theory of morality

André¹,

Debove²,

Fitouchi³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…While the framework is general, here we instantiate a version for the context of dialogue agents built on Transformer-based Large Language Models (LLM). Our approach builds directly on recent work arguing for contractualism in AI alignment (Awad et al 2022;Jin et al 2022;. Our Contractual AI is also complementary to previous rule-based approaches (Forbes et al 2020;Solaiman and Dennison 2021), such as Anthropic's Constitutional AI (Bai et al 2022), but addresses some of their key shortcomings: transparency, insularity, and accuracy.…”

Section: Overviewmentioning

confidence: 97%

Contractual AI: Toward More Aligned, Transparent, and Robust Dialogue Agents

Bates,

Bose,

Keeney

et al. 2024

AAAI-SS

View full text Add to dashboard Cite

We present a new framework for AI alignment called Contractual AI, and apply it to the setting of dialogue agents chatting with humans. This framework incorporates and builds on previous approaches to alignment, such as Constitutional AI. We propose that fully aligned systems may need both a "think fast" and a "think slow" systems for approximating complex human judgements. Fast thinking (System 1) is computationally cheap but rigid and brittle in novel situations, while slow thinking (System 2) is more expensive but more flexible and robust. System 1 makes judgements by asking whether a rule or principle is violated. System 2 does the explicit reasoning that produces the rules, explicitly tallying costs and benefits for all stakeholders. Rule-based systems like Constitutional AI correspond roughly to System 1. Here, we implement a prototype of System 2, and lay out a road-map for enabling the system to make more thorough and accurate considerations for all stakeholder groups, including those underrepresented in the training data (e.g. racial minorities). For initial testing, we guided the decision process through the steps of: 1) identifying all stakeholders, 2) listing their individual concerns, 3) soliciting the projected opinions of various experts, and 4) combining the expert opinions into a final moral judgement. The resulting text was less generic, more aware of complex stakeholder needs, and ultimately more actionable.

show abstract

“…Conversely, if the context shifts to "for the purpose of surveillance or spying," the same action loses its moral grounding. This phenomenon of flexibly bending moral rules in instantiations of scenarios is widely recognized in assorted cognitive science studies (Kwon et al, 2022;Levine et al, 2020;Awad et al, 2022;Levine et al, 2018).…”

Section: Introductionmentioning

confidence: 94%

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Rao,

Jiang,

Pyatkin

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

Moral or ethical judgments rely heavily on the specific contexts in which they occur. Understanding varying shades of defeasible contextualizations (i.e., additional information that strengthens or attenuates the moral acceptability of an action) is critical to accurately represent the subtlety and intricacy of grounded human moral judgment in real-life scenarios.We introduce defeasible moral reasoning: a task to provide grounded contexts that make an action more or less morally acceptable, along with commonsense rationales that justify the reasoning. To elicit high-quality task data, we take an iterative self-distillation approach that starts from a small amount of unstructured seed knowledge from GPT-3 and then alternates between (1) self-distillation from student models;(2) targeted filtering with a critic model trained by human judgment (to boost validity) and NLI (to boost diversity); (3) self-imitation learning (to amplify the desired data quality). This process yields a student model that produces defeasible contexts with improved validity, diversity, and defeasibility. From this model we distill a high-quality dataset, δ-RULES-OF-THUMB (δ-ROT), of 1.2M entries of contextualizations and rationales for 115K defeasible moral actions rated highly by human annotators 85.9% to 99.8% of the time. 1 Using δ-ROT we obtain a final student model that wins over all intermediate student models by a notable margin.

show abstract

When Is It Acceptable to Break the Rules? Knowledge Representation of Moral Judgement Based on Empirical Data

Cited by 4 publications

References 33 publications

An evolutionary contractualist theory of morality

An evolutionary contractualist theory of morality

Contractual AI: Toward More Aligned, Transparent, and Robust Dialogue Agents

What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations

Contact Info

Product

Resources

About