人工智能的生存风险：一个简洁的论证

Published on September 30, 2025 4:04 PM GMT

A lot of the writing making the case for AI doom is by Eliezer Yudkowsky, interspersed with the expected number of parables, tendentious philosophical asides, and complex metaphors. I think this can obscure the fact that the argument for AI doom is pretty straightforward and plausible—it requires just a few steps and none of them are obviously wrong. You don’t need to think humans are just fancy meat computers or that AI would buy into functional decision theories and acausally trade to buy the argument.

For this reason, I thought I’d try to concisely and briefly lay out the basic argument for AI doom.

The basic argument has a few steps:

We’re going to build superintelligent AI.It will be agent-like, in the sense of having long-term goals it tries to pursue.We won’t be able to align it, in the sense of getting its goals to be what we want them to be.An unaligned agentic AI will kill everyone/do something similarly bad.

Now, before I go on, just think to yourself: do any of these steps seem ridiculous? Don’t think about the inconvenient practical implications of believing them all in conjunction—just think about whether, if someone proposed any specific premise, you would think “that’s obviously false.” If you think each one has a 50% probability, then the odds AI kills everyone is 1/16, or about 6%. None of these premises strike me as ridiculous, and there isn’t anything approaching a knockdown argument against any them.

As for the first premise, there are reasons to think we might build superintelligent AI very soon. AI 2027, a sophisticated AI forecasting report, thinks it’s quite likely that we’ll have it within a decade. Given the meteoric rise in AI capabilities, with research capabilities going up about 25x per year, barring contrary direct divine revelation, it’s hard to see how one would be confident that we won’t get superintelligent AI soon. Bridging the gap between GPT2—which was wholly unusable—and GPT5 which knows more than anyone on the planet took only a few years. What licenses extreme confidence that over the course of decades, we won’t get anything superintelligent—anything that is to GPT5 what GPT5 is to GPT2?

The second premise claims that AI will be agent-like. This premise seems pretty plausible. There’s every incentive to make AI with “goals,” in the minimal sense of the ability to plan long-term for some aim (deploying something very intelligent that aims for X is often a good way to get X.) Fenwick and Qureshi write:

AI companies already create systems that make and carry out plans and tasks, and might be said to be pursuing goals, including:
Deep research tools, which can set about a plan for conducting research on the internet and then carry it outSelf-driving cars, which can plan a route, follow it, adjust the plan as they go along, and respond to obstaclesGame-playing systems, like AlphaStar for Starcraft, CICERO for Diplomacy, and MuZero for a range of games
All of these systems are limited in some ways, and they only work for specific use cases.
…
Some companies are developing even more broadly capable AI systems, which would have greater planning abilities and the capacity to pursue a wider range of goals.³ OpenAI, for example, has been open about its plan to create systems that can “join the workforce.”

AIs have gradually been performing longer and longer tasks. But if there’s a superintelligence that’s aware of the world and can perform very long tasks, then it would be a superintelligent agent. Thus, it seems we’re pretty likely to get superintelligent agents.

A brief note: there’s a philosophical question about whether it really has goals in some deep sense. Maybe you need to be conscious to have goals. But this isn’t super relevant to the risk question—what matters isn’t the definition of the word goal but whether the AI will have capabilities that will be dangerous. If the AI tries to pursue long-term tasks with superhuman efficiency, then whether or not you technically label that a goal, it’s pretty dangerous.

The third premise is that we won’t be able to align AI to be safe. The core problem is that it’s pretty hard to get something to follow your will if it has goals and is much smarter than you. We don’t really know how to do that yet. And even if an AI has only slightly skewed goals, that could be catastrophic. If you take most goals to the limit, you get doom. Only a tiny portion of the things one could aim at would involve keeping humans around if taken to their limit.

There are some proposals for keeping AI safe, and there’s some chance that the current method would work for making AI safe (just discourage it when it does things we don’t like). At the very least, however, none of this seems obvious. In light of there being nothing that can definitely keep AI from becoming misaligned, we should not be very confident that AI will be aligned.

The last step says that if the AI was misaligned it would kill us all or do something similarly terrible. Being misaligned means it has goals that aren’t in line with our goals. Perhaps a misaligned AI would optimize for racking up some internal reward function that existed in its training data, which would involve generating a maximally powerful computer to store the biggest number it could.

If the AI has misaligned goals then it will be aiming for things that aren’t in accordance with human values. Most of the goals one could have, taken to the limit, entail our annihilation (to, for instance, prevent us from stopping it from building a super powerful computer). This is because of something called instrumental convergence—some actions are valuable on a wide range of goals. Most goals a person could have make it good for them to get lots of money; no matter what you want, it will be easier if you’re super rich. Similarly, most goals the AI has will make it valuable to stop the people who could plausibly stop them.

So then the only remaining question is: will it be able to?

Now, as it happens, I do not feel entirely comfortable gambling the fate of the world on a superintelligent AI not being able to kill everyone. Nor should you. Superintelligence gives one extraordinary capacities. The best human chess players cannot even come close to the chess playing of AI—we have already passed the date when the best human might ever, in the course of 1,000 years, beat the best AI.

In light of this, if the AI wanted to kill us, it seems reasonably likely that it would. Perhaps the AI could develop some highly lethal virus that eviscerates all human life. Perhaps the AI could develop some super duper nanotechnology that would destroy the oxygen in the air and make it impossible for us to breathe. But while we should be fairly skeptical about any specific scenario, there is nothing that licenses extreme confidence in the proposition that a being a thousand times smarter than us that thinks thousands of times faster wouldn’t be able to find a way to kill us.

Now, I’m not as much of a doomer as some people. I do not think we are guaranteed to all be annihilated by AI. Were I to bet on an outcome, I would bet on the AI not killing us (and this is not merely because, were the AIs to kill us all, I wouldn’t be able to collect my check). To my mind, while every premise is plausible, the premises are generally not obviously true. I feel considerable doubt about each of them. Perhaps I’d give the first one 50% odds in the next decade, the next 60% odds, the third 30% odds, and the last 70% odds. This overall leaves me with about a 6% chance of doom. And while you shouldn’t take such numbers too literally, they give a rough, order-of-magnitude feel for the probabilities.

I think the extreme, Yudkowsky-style doomers, and those who are blazingly unconcerned about existential risks from AI are, ironically, making rather similar errors. Both take as obvious some extremely non-obvious premises in a chain of reasoning, and have an unreasonably high confidence that some event will turn out a specific way. I cannot, for the life of me, see what could possibly compel a person to be astronomically certain in the falsity of any of the steps I described, other than the fact that saying that AI might kill everyone soon gets you weird looks, and people don’t like those.

Thus, I think the following conclusion is pretty clear: there is a non-trivial chance that AI will kill everyone in the next few decades. It’s not guaranteed, but neither is it guaranteed that if you license your five-year-old to drive your vehicle on the freeway, with you as the passenger, you will die. Nonetheless, I wouldn’t recommend it. If you are interested in doing something with your career about this enormous risk, I recommend this piece about promising careers in AI safety.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签