Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and regular improvements. Please share your thoughts.

Learning From Human Preferences

13 May 2023 · AI Safety Fundamentals: Alignment

AI Safety Fundamentals: Alignment

Readings from the AI Safety Fundamentals: Alignment course.

Homepage

Episode source:

AI Safety Fundamentals: Alignment

Homepage

Subscribe:

Apple Podcasts Spotify Google Podcasts RSS

Add to my feed

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety team, we’ve developed an algorithm which can infer what humans want by being told which of two proposed behaviors is better.

Original article:
https://openai.com/research/learning-from-human-preferences

Authors:
Dario Amodei, Paul Christiano, Alex Ray