The Value Learning Problem

Soares, Nate

doi:10.1201/9781351251389-7

Cited by 25 publications

(18 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another widely-used definition of AGI safety is value-alignment between humans and AGI, and herein, between AGI n and AGI n+1 . Value-sets, from which goals are generated, can be hard-coded, re-coded in AGI versions, or be more dynamic by programming the AGI to learn the desired values via techniques such as inverse reinforcement learning [3,17,21]. In such a scenario, which saints will select the saintly humans to emulate?…”

Section: Lack Of Proof Of Safe Agi or Methods To Prove Safe Agimentioning

confidence: 99%

See 1 more Smart Citation

Provably Safe Artificial General Intelligence via Interactive Proofs

Carlson¹

2021

Preprint

View full text Add to dashboard Cite

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn≪AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2-100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn↔AGIn+1 interaction hazards to an acceptably low level.

show abstract

Section: Lack Of Proof Of Safe Agi or Methods To Prove Safe Agimentioning

confidence: 99%

“…The ability of AGI to self-correct or to assist its designers in correction of value alignment and behavior is called 'corrigibility' by Soares [21]. Miller et al review and examine how corrigibility can result in mis-alignment of values [14].…”

Section: Probabilistically Checkable Proofs (Pcp Theorem)mentioning

confidence: 99%

Provably Safe Artificial General Intelligence via Interactive Proofs

Carlson¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Another widely used definition of AGI safety is value-alignment between humans and AGI, and herein, between AGI n and AGI n+1 . Value-sets, from which goals are generated, can be hard-coded, re-coded in AGI versions, or be more dynamic by programming the AGI to learn the desired values via techniques such as inverse reinforcement learning [3,16,20]. In such a scenario, which saints will select the saintly humans to emulate?…”

Section: Lack Of Proof Of Safe Agi or Methods To Prove Safe Agimentioning

confidence: 99%

“…The ability of AGI to self-correct or to assist its designers in correction of value alignment and behavior is called 'corrigibility' by Soares [20]. Miller et al review and examine how corrigibility can result in mis-alignment of values [50].…”

Section: Probabilistically Checkable Proofs (Pcp Theorem)mentioning

confidence: 99%

Provably Safe Artificial General Intelligence via Interactive Proofs

Carlson

2021

Philosophies

View full text Add to dashboard Cite

Methods are currently lacking to prove artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation AGI1 rapidly triggers a succession of more powerful AGIn that differ dramatically in their computational capabilities (AGIn << AGIn+1). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2−100). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. In toto, IPS provides a way to reduce AGIn ↔ AGIn+1 interaction hazards to an acceptably low level.

show abstract

“…However, the use of personal data without consent is one of the main preoccupations found in the literature involving AI Ethics. (Soares, 2016;Russel, 2019), and even how to integrate human society in a post-Singularity era (Chalmers, 2010).…”

Section: Privacymentioning

confidence: 99%

Good AI for the Present of Humanity Democratizing AI Governance

Corrêa¹,

Oliveira²

2021

AIEJ

View full text Add to dashboard Cite

What do Cyberpunk and AI Ethics have to do with each other? Cyberpunk is a sub-genre of science fiction that explores the post-human relationships between human experience and technology. One similarity between AI Ethics and Cyberpunk literature is that both seek to explore future social and ethical problems that our technological advances may bring upon society. In recent years, an increasing number of ethical matters involving AI have been pointed and debated, and several ethical principles and guides have been suggested as governance policies for the tech industry. However, would this be the role of AI Ethics? To serve as a soft and ambiguous version of the law? We would like to advocate in this article for a more Cyberpunk way of doing AI Ethics, with a more democratic way of governance. In this study, we will seek to expose some of the deficits of the underlying power structures of the AI industry, and suggest that AI governance be subject to public opinion, so that ‘good AI’ can become ‘good AI for all.’

show abstract

The Value Learning Problem

Cited by 25 publications

References 10 publications

Provably Safe Artificial General Intelligence via Interactive Proofs

Provably Safe Artificial General Intelligence via Interactive Proofs

Provably Safe Artificial General Intelligence via Interactive Proofs

Good AI for the Present of Humanity Democratizing AI Governance

Contact Info

Product

Resources

About