May 2023: Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and daily improvements.
Please share your thoughts, but don't share this link on social media, for now.
We only have recent episodes right now, and there are some false positives. Will be fixed soon!
Episode: Joseph Carlsmith - Utopia, AI, & Infinite Ethics
Release date: 2022-08-03
Joseph Carlsmith is a senior research analyst at Open Philanthropy and a doctoral student in philosophy at the University of Oxford.
We discuss utopia, artificial intelligence, computational power of the brain, infinite ethics, learning from the fact that you exist, perils of futurism, and blogging.
Watch on YouTube. Listen on Spotify, Apple Podcasts, etc.
Episode website + Transcript here.
Follow Joseph on Twitter. Follow me on Twitter.
Subscribe to find out about future episodes!
Timestamps
(0:00:06) - Introduction
(0:02:53) - How to Define a Better Future?
(0:09:19) - Utopia
(0:25:12) - Robin Hanson’s EMs
(0:27:35) - Human Computational Capacity
(0:34:15) - FLOPS to Emulate Human Cognition?
(0:40:15) - Infinite Ethics
(1:00:51) - SIA vs SSA
(1:17:53) - Futurism & Unreality
(1:23:36) - Blogging & Productivity
(1:28:43) - Book Recommendations
(1:30:04) - Conclusion
Please share if you enjoyed this episode! Helps out a ton!
Get full access to The Lunar Society at www.dwarkeshpatel.com/subscribe
Joseph Carlsmith is a senior research analyst at Open Philanthropy and a doctoral student in philosophy at the University of Oxford.
We discuss utopia, artificial intelligence, computational power of the brain, infinite ethics, learning from the fact that you exist, perils of futurism, and blogging.
Watch on YouTube. Listen on Spotify, Apple Podcasts, etc.
Episode website + Transcript here.
Follow Joseph on Twitter. Follow me on Twitter.
Subscribe to find out about future episodes!
Timestamps
(0:00:06) - Introduction
(0:02:53) - How to Define a Better Future?
(0:09:19) - Utopia
(0:25:12) - Robin Hanson’s EMs
(0:27:35) - Human Computational Capacity
(0:34:15) - FLOPS to Emulate Human Cognition?
(0:40:15) - Infinite Ethics
(1:00:51) - SIA vs SSA
(1:17:53) - Futurism & Unreality
(1:23:36) - Blogging & Productivity
(1:28:43) - Book Recommendations
(1:30:04) - Conclusion
Please share if you enjoyed this episode! Helps out a ton!
Get full access to The Lunar Society at www.dwarkeshpatel.com/subscribe
This is part two of: Is power-seeking AI an existential risk?, published by Joseph Carlsmith.
5. Deployment
Let’s turn, now, to whether we should expect to actually see practically PS-misaligned APS systems deployed in the world.
The previous section doesn’t settle this. In particular: if a technology is difficult to make safe, this doesn’t mean that lots of people will use it in unsafe ways. Rather, they might adjust their usage to reflect the degree of safety achieved. Thus, if we couldn’t build planes that reliably don’t crash, we wouldn’t expect to see people dying in plane crashes all the time (especially not after initial accidents); rather, we’d expect to see people not flying. And such caution becomes more likely as the stakes of safety failures increase.
Absent counterargument, we might expect something similar with AI. Indeed, some amount of alignment seems like a significant constraint on the usefulness and commercial viability of AI technology generally. Thus, if problems with proxies, or search, make it difficult to give house-cleaning robots the right objectives, we shouldn’t expect to see lots of such robots killing people’s cats (or children); rather, we should expect to see lots of difficulties making profitable house-cleaning robots. Indeed, by the time self-driving cars see widespread use, they will likely be quite safe (maybe too safe, relative to human drivers they could’ve replaced earlier).
What’s more, safety failures can result, for a developer/deployer, in significant social/regulatory backlash and economic cost. The 2017 crashes of Boeing’s 737 MAX aircraft, for example, resulted in an estimated ~$20 billion in direct costs, and tens of billions more in cancelled orders. And sufficiently severe forms of failure can result in direct bodily harm to decision-makers and their loved ones (everyone involved in creating a doomsday virus, for example, has a strong incentive to make sure it’s not released).
Many incentives, then, favor safety -- and incentives to prevent harmful and large-scale forms of misaligned power-seeking seem especially clear. Faced with such incentives, why would anyone use, or deploy, a strategically-aware AI agent that will end up seeking power in unintended ways?
It’s an important question, and one I’ll look at in some detail. In particular, I think these considerations suggest that we should be less worried about practically PS-misaligned agents that are so unreliably well-behaved (at least externally) that they aren’t useful, and more worried about practically PS-misaligned agents whose abilities (including their abilities to behave in the ways we want, when it’s useful for them to do so) make them at least superficially attractive to use/deploy -- because of e.g. the profit, social benefit, and/or strategic advantage that using/deploying them affords, or appears to afford. My central worry is that it will be substantially easier to build that type of agent than it will be to build agents that are genuinely practically PS-aligned -- and that the beliefs and incentives of relevant actors will result in such practically PS-misaligned agents getting used/deployed regardless.
5.1 Timing of problems
I’ll think of “deployment” as the point where an AI system moves out of a development/laboratory/testing environment and into a position of real-world influence (even if this influence is mediated via e.g. humans following its instructions). This isn’t always a discrete point; sometimes, for example, it’s an ongoing process, influenced by many individual decisions to accord an AI agent somewhat greater influence. For simplicity, though, I’ll think of it as a discrete point in what follows -- analogous to the point where a product “launches,” “ships,” or starts really getting “used.”
We can distinguish between practical PS-alignment failur...
This is part one of: Is power-seeking AI an existential risk?, published by Joseph Carlsmith.
1. Introduction
Some worry that the development of advanced artificial intelligence will result in existential catastrophe -- that is, the destruction of humanity’s longterm potential. Here I examine the following version of this worry (it’s not the only version):
By 2070:
It will become possible and financially feasible to build AI systems with the following properties:
Advanced capability: they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today’s world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation).
Agentic planning: they make and execute plans, in pursuit of objectives, on the basis of models of the world.
Strategic awareness: the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining power over humans and the real-world environment.
(Call these “APS” -- Advanced, Planning, Strategically aware -- systems.)
There will be strong incentives to build and deploy APS systems | (1).
It will be much harder to build APS systems that would not seek to gain and maintain power in unintended ways (because of problems with their objectives) on any of the inputs they’d encounter if deployed, than to build APS systems that would do this (even if decision-makers don’t know it), but which are at least superficially attractive to deploy anyway | (1)-(2).
Some deployed APS systems will be exposed to inputs where they seek power in unintended and high-impact ways (say, collectively causing >$1 trillion dollars of damage), because of problems with their objectives | (1)-(3).
Some of this power-seeking will scale (in aggregate) to the point of permanently disempowering ~all of humanity | (1)-(4).
This disempowerment will constitute an existential catastrophe | (1)-(5).
These claims are extremely important if true. My aim is to investigate them. I assume for the sake of argument that (1) is true (I currently assign this >40% probability). I then examine (2)-(5), and say a few words about (6).
My current view is that there is a small but substantive chance that a scenario along these lines occurs, and that many people alive today -- including myself -- live to see humanity permanently disempowered by artificial systems. In the final section, I take an initial stab at quantifying this risk, by assigning rough probabilities to 1-6. My current, highly-unstable, subjective estimate is that there is a ~5% percent chance of existential catastrophe by 2070 from scenarios in which (1)-(6) are true. My main hope, though, is not to push for a specific number, but rather to lay out the arguments in a way that can facilitate productive debate.
Acknowledgments: Thanks to Asya Bergal, Alexander Berger, Paul Christiano, Ajeya Cotra, Tom Davidson, Daniel Dewey, Owain Evans, Ben Garfinkel, Katja Grace, Jacob Hilton, Evan Hubinger, Jared Kaplan, Holden Karnofsky, Sam McCandlish, Luke Muehlhauser, Richard Ngo, David Roodman, Rohin Shah, Carl Shulman, Nate Soares, Jacob Steinhardt, and Eliezer Yudkowsky for input on earlier stages of this project; and thanks to Nick Beckstead for guidance and support throughout the investigation. The views expressed here are my own.
1.1 Preliminaries
Some preliminaries and caveats (those eager for the main content can skip):
I’m focused, here, on a very specific type of worry. There are lots of other ways to be worried about AI -- and even, about existential catastrophes resulting from AI. And there are lots of ways to be excited about AI, too.
My emphasis and approach differs from that of others in the literature in various ways. In particular: I’m less focused than some on the possibility of an extre...
On this episode of the Utilitarian Podcast, I talk with Joseph Carlsmith. Joseph is a research analyst at Open Philanthropy and a doctoral student in philosophy at the University of Oxford. His views and opinions in this podcast are his own, and not necessarily those of Open Philanthropy.
Our conversation has three main themes. We talk about the long-term future, including the possibility of actually creating utopia. We talk about Joseph’s work on the computational power of the brain. And we talk about meta-ethics and consciousness, including discussions of illusionism and the effects of meditation.
The Utilitarian Podcast now has a dedicated website, at utilitarianpodcast.com. At the site, you’ll find full transcripts of selected episodes, including this one. These transcripts have been generously funded by James Evans. I’ve also set up an email, which is utilitarianpodcast@gmail.com where you can send criticism, questions, suggestions and so on.
Effective Altruism is a social movement dedicated to finding ways to do the most good possible, whether through charitable donations, career choices, or volunteer projects. EA Global conferences are gatherings for EAs to meet. You can also listen to this talk along with its accompanying video on YouTube.
I discuss the orientation towards the long-term future that motivates some people in the effective altruist community to focus on it.
Source: Effective Altruism Global (video).
Effective Altruism is a social movement dedicated to finding ways to do the most good possible, whether through charitable donations, career choices, or volunteer projects. EA Global conferences are gatherings for EAs to meet. You can also listen to this talk along with its accompanying video on YouTube.
Source: Effective Altruism Global (video).