May 2023: Welcome to the alpha release of TYPE III AUDIO.
Expect very rough edges and very broken stuff—and daily improvements.
Please share your thoughts, but don't share this link on social media, for now.

Homearrow rightPlaylists

[Week 3] “Coordination challenges for preventing AI conflict” by Stefan Torges

AGI Safety Fundamentals: Governance

Readings from the AI Safety Fundamentals: Governance course.


Apple PodcastsSpotifyGoogle PodcastsRSS

In this article, I examine the challenge of ensuring coordination between AI developers to prevent catastrophic failure modes arising from the interactions of their systems. More specifically, I am interested in addressing bargaining failures as outlined in Jesse Clifton’s research agenda on Cooperation, Conflict & Transformative Artificial Intelligence (TAI) (2019) and Dafoe et al.’s Open Problems in Cooperative AI (2020).

First, I set out the general problem of bargaining failure and why bargaining problems might persist even for aligned superintelligent agents. Then, I argue for why developers might be in a good position to address the issue. I use a toy model to analyze whether we should expect them to do so by default. I deepen this analysis by comparing the merit and likelihood of different coordinated solutions. Finally, I suggest directions for interventions and future work.

The main goal of this article is to encourage and enable future work. To do so, I sketch the full path from problem to potential interventions. This large scope comes at the cost of depth of analysis. The models I use are primarily intended to illustrate how a particular question along this path can be tackled rather than to arrive at robust conclusions. At some point, I might revisit parts of this article to bolster the analysis in later sections.

Original text:

Narrated for AGI Safety Fundamentals by TYPE III AUDIO.

Share feedback on this narration.