May 2023: Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and daily improvements.
Please share your thoughts, but don't share this link on social media, for now.
Longtermists have argued that humanity should significantly increase its efforts to prevent catastrophes like nuclear wars, pandemics, and AI disasters. But one prominent longtermist argument overshoots this conclusion: the argument also implies that humanity should reduce the risk of existential catastrophe even at extreme cost to the present generation. This overshoot means that democratic governments cannot use the longtermist argument to guide their catastrophe policy. In this paper, we show that the case for preventing catastrophe does not depend on longtermism. Standard cost-benefit analysis implies that governments should spend much more on reducing catastrophic risk. We argue that a government catastrophe policy guided by cost-benefit analysis should be the goal of longtermists in the political sphere. This policy would be democratically acceptable, and it would reduce existential risk by almost as much as a strong longtermist policy.
Original article:
https://forum.effectivealtruism.org/posts/DiGL5FuLgWActPBsf/how-much-should-governments-pay-to-prevent-catastrophes
Narrated for the Effective Altruism Forum by TYPE III AUDIO.
In addition to technical challenges, plans to safely develop AI face lots of organizational challenges. If you're running an AI lab, you need a concrete plan for handling that.
In this post, I'll explore some of those issues, using one particular AI plan as an example. I first heard this described by Buck at EA Global London, and more recently with OpenAI's alignment plan. (I think Anthropic's plan has a fairly different ontology, although it still ultimately routes through a similar set of difficulties)
I'd call the cluster of plans similar to this "Carefully Bootstrapped Alignment."
Original article:
https://www.lesswrong.com/posts/thkAtqoQwN6DtaiGT/carefully-bootstrapped-alignment-is-organizationally-hard
Narrated for LessWrong by TYPE III AUDIO.
There is a lot of disagreement and confusion about the feasibility and risks associated with automating alignment research. Some see it as the default path toward building aligned AI, while others expect limited benefit from near term systems, expecting the ability to significantly speed up progress to appear well after misalignment and deception. Furthermore, progress in this area may directly shorten timelines or enable the creation of dual purpose systems which significantly speed up capabilities research.
OpenAI recently released their alignment plan. It focuses heavily on outsourcing cognitive work to language models, transitioning us to a regime where humans mostly provide oversight to automated research assistants. While there have been a lot of objections to and concerns about this plan, there hasn’t been a strong alternative approach aiming to automate alignment research which also takes all of the many risks seriously.
The intention of this post is not to propose an end-all cure for the tricky problem of accelerating alignment using GPT models. Instead, the purpose is to explicitly put another point on the map of possible strategies, and to add nuance to the overall discussion.
Original article:
https://www.lesswrong.com/posts/bxt7uCiHam4QXrQAA/cyborgism
Narrated for LessWrong by TYPE III AUDIO.
If you fear that someone will build a machine that will seize control of the world and annihilate humanity, then one kind of response is to try to build further machines that will seize control of the world even earlier without destroying it, forestalling the ruinous machine’s conquest. An alternative or complementary kind of response is to try to avert such machines being built at all, at least while the degree of their apocalyptic tendencies is ambiguous.
The latter approach seems to me like the kind of basic and obvious thing worthy of at least consideration, and also in its favor, fits nicely in the genre ‘stuff that it isn’t that hard to imagine happening in the real world’. Yet my impression is that for people worried about extinction risk from artificial intelligence, strategies under the heading ‘actively slow down AI progress’ have historically been dismissed and ignored (though ‘don’t actively speed up AI progress’ is popular).
Original article:
https://forum.effectivealtruism.org/posts/vwK3v3Mekf6Jjpeep/let-s-think-about-slowing-down-ai-1
Narrated for the Effective Altruism Forum by TYPE III AUDIO.
In previous pieces, I argued that there's a real and large risk of AI systems' aiming to defeat all of humanity combined - and succeeding.
I first argued that this sort of catastrophe would be likely without specific countermeasures to prevent it. I then argued that countermeasures could be challenging, due to some key difficulties of AI safety research.
But while I think misalignment risk is serious and presents major challenges, I don’t agree with sentiments along the lines of “We haven’t figured out how to align an AI, so if transformative AI comes soon, we’re doomed.” Here I’m going to talk about some of my high-level hopes for how we might end up avoiding this risk.
Original article:
https://forum.effectivealtruism.org/posts/rJRw78oihoT5paFGd/high-level-hopes-for-ai-alignment
Narrated by Holden Karnofsky for the Cold Takes blog.
This is going to be a list of holes I see in the basic argument for existential risk from superhuman AI systems.
To start, here’s an outline of what I take to be the basic case:
I. If superhuman AI systems are built, any given system is likely to be ‘goal-directed’
II. If goal-directed superhuman AI systems are built, their desired outcomes will probably be about as bad as an empty universe by human lights
III. If most goal-directed superhuman AI systems have bad goals, the future will very likely be bad
Original article:
https://forum.effectivealtruism.org/posts/zoWypGfXLmYsDFivk/counterarguments-to-the-basic-ai-risk-case
Narrated for the Effective Altruism Forum by TYPE III AUDIO.
https://80000hours.org/career-reviews/china-related-ai-safety-and-governance-paths/#strong-networking-abilities-especially-for-policy-roles
Expertise in China and its relations with the world might be critical in tackling some of the world’s most pressing problems. In particular, China’s relationship with the US is arguably the most important bilateral relationship in the world, with these two countries collectively accounting for over 40% of global GDP.1 These considerations led us to publish a guide to improving China–Western coordination on global catastrophic risks and other key problems in 2018. Since then, we have seen an increase in the number of people exploring this area.
China is one of the most important countries developing and shaping advanced artificial intelligence (AI). The Chinese government’s spending on AI research and development is estimated to be on the same order of magnitude as that of the US government,2 and China’s AI research is prominent on the world stage and growing.
Because of the importance of AI from the perspective of improving the long-run trajectory of the world, we think relations between China and the US on AI could be among the most important aspects of their relationship. Insofar as the EU and/or UK influence advanced AI development through labs based in their countries or through their influence on global regulation, the state of understanding and coordination between European and Chinese actors on AI safety and governance could also be significant.
That, in short, is why we think working on AI safety and governance in China and/or building mutual understanding between Chinese and Western actors in these areas is likely to be one of the most promising China-related career paths. Below we provide more arguments and detailed information on this option.
If you are interested in pursuing a career path described in this profile, contact 80,000 Hours’ one-on-one team and we may be able to put you in touch with a specialist advisor.
This post is part of my AI strategy nearcasting series: trying to answer key strategic questions about transformative AI, under the assumption that key events will happen very soon, and/or in a world that is otherwise very similar to today's.
This post gives my understanding of what the set of available strategies for aligning transformative AI would be if it were developed very soon, and why they might or might not work. It is heavily based on conversations with Paul Christiano, Ajeya Cotra and Carl Shulman, and its background assumptions correspond to the arguments Ajeya makes in this piece (abbreviated as “Takeover Analysis”).
I premise this piece on a nearcast in which a major AI company (“Magma,” following Ajeya’s terminology) has good reason to think that it can develop transformative AI very soon (within a year), using what Ajeya calls “human feedback on diverse tasks” (HFDT) - and has some time (more than 6 months, but less than 2 years) to set up special measures to reduce the risks of misaligned AI before there’s much chance of someone else deploying transformative AI.