May 2023: Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and daily improvements.
Please share your thoughts, but don't share this link on social media, for now.

Homearrow rightPodcasts

(Part 2/2) Is power-seeking AI an existential risk? by Joseph Carlsmith

Radio Bostrom

Audio narrations of academic papers by Nick Bostrom.

Subscribe

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.
This is part two of: Is power-seeking AI an existential risk?, published by Joseph Carlsmith.
5. Deployment
Let’s turn, now, to whether we should expect to actually see practically PS-misaligned APS systems deployed in the world.
The previous section doesn’t settle this. In particular: if a technology is difficult to make safe, this doesn’t mean that lots of people will use it in unsafe ways. Rather, they might adjust their usage to reflect the degree of safety achieved. Thus, if we couldn’t build planes that reliably don’t crash, we wouldn’t expect to see people dying in plane crashes all the time (especially not after initial accidents); rather, we’d expect to see people not flying. And such caution becomes more likely as the stakes of safety failures increase.
Absent counterargument, we might expect something similar with AI. Indeed, some amount of alignment seems like a significant constraint on the usefulness and commercial viability of AI technology generally. Thus, if problems with proxies, or search, make it difficult to give house-cleaning robots the right objectives, we shouldn’t expect to see lots of such robots killing people’s cats (or children); rather, we should expect to see lots of difficulties making profitable house-cleaning robots. Indeed, by the time self-driving cars see widespread use, they will likely be quite safe (maybe too safe, relative to human drivers they could’ve replaced earlier).
What’s more, safety failures can result, for a developer/deployer, in significant social/regulatory backlash and economic cost. The 2017 crashes of Boeing’s 737 MAX aircraft, for example, resulted in an estimated ~$20 billion in direct costs, and tens of billions more in cancelled orders. And sufficiently severe forms of failure can result in direct bodily harm to decision-makers and their loved ones (everyone involved in creating a doomsday virus, for example, has a strong incentive to make sure it’s not released).
Many incentives, then, favor safety -- and incentives to prevent harmful and large-scale forms of misaligned power-seeking seem especially clear. Faced with such incentives, why would anyone use, or deploy, a strategically-aware AI agent that will end up seeking power in unintended ways?
It’s an important question, and one I’ll look at in some detail. In particular, I think these considerations suggest that we should be less worried about practically PS-misaligned agents that are so unreliably well-behaved (at least externally) that they aren’t useful, and more worried about practically PS-misaligned agents whose abilities (including their abilities to behave in the ways we want, when it’s useful for them to do so) make them at least superficially attractive to use/deploy -- because of e.g. the profit, social benefit, and/or strategic advantage that using/deploying them affords, or appears to afford. My central worry is that it will be substantially easier to build that type of agent than it will be to build agents that are genuinely practically PS-aligned -- and that the beliefs and incentives of relevant actors will result in such practically PS-misaligned agents getting used/deployed regardless.
5.1 Timing of problems
I’ll think of “deployment” as the point where an AI system moves out of a development/laboratory/testing environment and into a position of real-world influence (even if this influence is mediated via e.g. humans following its instructions). This isn’t always a discrete point; sometimes, for example, it’s an ongoing process, influenced by many individual decisions to accord an AI agent somewhat greater influence. For simplicity, though, I’ll think of it as a discrete point in what follows -- analogous to the point where a product “launches,” “ships,” or starts really getting “used.”
We can distinguish between practical PS-alignment failur...