Alignment for Advanced Machine Learning Systems

Taylor, Jessica; Yudkowsky, Eliezer; LaVictoire, Patrick; Critch, Andrew

doi:10.1093/oso/9780190905033.003.0013

Cited by 41 publications

(50 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Disregarding emotion, it is fair to say that creators of intelligent agents would likely want emotion, again depending on the circumstance, to be left out, although design frameworks like the independent core observer model (ICOM) may ultimately prove more beneficial in the long run (Waser 2016). This is where adding to the counteridentical framework described above becomes critical: there are too many variables to account for and too many ways of solving or behaving during a situation that limiting certain decisions seem to be the only way, particularly to mitigate any adverse outcomes that may result from a top-down hierarchical goal structure (Taylor et al 2016b). That is unless the intelligent agent has transcended traditional artificial intelligence (Goertzel 2016;Rolf and Crook 2016).…”

Section: A Controlled Approachmentioning

confidence: 99%

A Value-Sensitive Design Approach to Intelligent Agents

Umbrello¹,

Bellis²

2018

Artificial Intelligence Safety and Security

View full text Add to dashboard Cite

show abstract

Section: A Controlled Approachmentioning

confidence: 99%

A Value-Sensitive Design Approach to Intelligent Agents

Umbrello¹,

Bellis²

2018

Artificial Intelligence Safety and Security

View full text Add to dashboard Cite

show abstract

“…Humans have expectations that-just like other humans-agents will conform to personal values and to social norms [5], even when not explicitly communicated. This is the value alignment problem [1,4,26,30,33]. Some assert that agents should be imbued with the capability for moral decision making [10,32], but morals are more difficult to define than values or norms.…”

Section: Related Workmentioning

confidence: 99%

Learning Norms from Stories

Nahian

Frazier

Riedl

et al. 2020

Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

View full text Add to dashboard Cite

Value alignment is a property of an intelligent agent indicating that it can only pursue goals and activities that are beneficial to humans. Traditional approaches to value alignment use imitation learning or preference learning to infer the values of humans by observing their behavior. We introduce a complementary technique in which a value-aligned prior is learned from naturally occurring stories which encode societal norms. Training data is sourced from the children's educational comic strip, Goofus & Gallant. In this work, we train multiple machine learning models to classify natural language descriptions of situations found in the comic strip as normative or non-normative by identifying if they align with the main characters' behavior. We also report the models' performance when transferring to two unrelated tasks with little to no additional training on the new task. CCS CONCEPTS• Computing methodologies → Natural language processing; Transfer learning.

show abstract

“…Furthermore, many civil service organizations try to encourage creativity and improve transparency; the lack of transparency, particularly in decision-making, is a key challenge among public administration organizations. Continuous learning and new knowledge application in technology might help obey human values and norms (Abel et al 2016) and reshape values and principles (Taylor et al 2017). Although it is difficult to cover a rich variety of complex knowledge domains by technology, due to tacitness of knowledge, Otterlo (2017) suggests formalizing present fundamentals of ethical codes via suitable computational logics, which could make ethical and moral norms-related communication more value-adding, engaging, and efficient.…”

Section: The Role Of Ethical and Social Norms In Transformational Communicationmentioning

confidence: 99%