Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and regular improvements. Please share your thoughts.
Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and regular improvements. Please share your thoughts.
Readings from the AI Safety Fundamentals: Alignment course.
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
The stereotyped image of AI catastrophe is a powerful, malicious AI system that takes its creators by surprise and quickly achieves a decisive advantage over the rest of humanity.
I think this is probably not what failure will look like, and I want to try to paint a more realistic picture. I’ll tell the story in two parts:
Part I: machine learning will increase our ability to “get what we can measure,” which could cause a slow-rolling catastrophe. ("Going out with a whimper.")
Part II: ML training, like competitive economies or natural ecosystems, can give rise to “greedy” patterns that try to expand their own influence. Such patterns can ultimately dominate the behavior of a system and cause sudden breakdowns. ("Going out with a bang," an instance of optimization daemons.) I think these are the most important problems if we fail to solve intent alignment.
In practice these problems will interact with each other, and with other disruptions/instability caused by rapid progress. These problems are worse in worlds where progress is relatively fast, and fast takeoff can be a key risk factor, but I’m scared even if we have several years.
Crossposted from the LessWrong Curated Podcast by TYPE III AUDIO.
---
Click “Add to feed” on episodes, playlists, and people.
Listen online or via your podcast app.