Wei Dai’s Fascinating Analysis of AI Safety Success Stories

Wei Dai's Fascinating Analysis of AI Safety Success Stories

Fascinating, articulate analysis of AI Risk by Wei Dai

By Wei_Dai

AI safety researchers often describe their long term goals as building “safe and efficient AIs”, but don’t always mean the same thing by this or other seemingly similar phrases. Asking about their “success stories” (i.e., scenarios in which their line of research helps contribute to a positive outcome) can help make clear what their actual research aims are. Knowing such scenarios also makes it easier to compare the ambition, difficulty, and other attributes of different lines of AI safety research. I hope this contributes to improved communication and coordination between different groups of people working on AI risk.

In the rest of the post, I describe some common AI safety success stories that I’ve heard over the years and then compare them along a number of dimensions. They are listed in roughly the order in which they first came to my attention. (Suggestions welcome for better names for any of these scenarios, as well as additional success stories and additional dimensions along which they can be compared.)

Sovereign Singleton

AKA Friendly AI, an autonomous, superhumanly intelligent AGI that takes over the world and optimizes it according to some (perhaps indirect) specification of human values.

Pivotal Tool

An oracle or task AGI, which can be used to perform a pivotal but limited act, and then stops to wait for further instructions.

Corrigible Contender

A semi-autonomous AGI that does not have long-term preferences of its own but acts according to (its understanding of) the short-term preferences of some human or group of humans, it competes effectively with comparable AGIs corrigible to other users as well as unaligned AGIs (if any exist), for resources and ultimately for influence on the future of the universe.

Interim Quality-of-Life Improver

AI risk can be minimized if world powers coordinate to limit AI capabilities development or deployment, in order to give AI safety researchers more time to figure out how to build a very safe and highly capable AGI. While that is proceeding, it may be a good idea (e.g., politically advisable and/or morally correct) to deploy relatively safe, limited AIs that can improve people’s quality of life but are not necessarily state of the art in terms of capability or efficiency. Such improvements can for example include curing diseases and solving pressing scientific and technological problems.

(I want to credit Rohin Shah as the person that I got this success story from, but can’t find the post or comment where he talked about it. Was it someone else?)

Research Assistant

If an AGI project gains a lead over its competitors, it may be able to grow that into a larger lead by building AIs to help with (either safety or capability) research. This can be in the form of an oracle, or human imitation, or even narrow AIs useful for making money (which can be used to buy more compute, hire more human researchers, etc). Such Research Assistant AIs can help pave the way to one of the other, more definitive success stories. Examples: 1, 2.