Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and regular improvements. Please share your thoughts.

AI Control: Improving Safety Despite Intentional Subversion

7 April 2024 · AI Safety Fundamentals: Alignment

AI Safety Fundamentals: Alignment

Readings from the AI Safety Fundamentals: Alignment course.

Homepage

Episode source:

AI Safety Fundamentals: Alignment

Homepage

Subscribe:

Apple Podcasts Spotify Google Podcasts RSS

Add to my feed

We’ve released a paper, AI Control: Improving Safety Despite Intentional Subversion. This paper explores techniques that prevent AI catastrophes even if AI instances are colluding to subvert the safety techniques. In this post:

We summarize the paper;
We compare our methodology to the methodology of other safety papers.

Source:
https://www.alignmentforum.org/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion

Narrated for AI Safety Fundamentals by Perrin Walker