少点错误 10月01日 00:09
人工智能的生存风险:一个简洁的论证
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文旨在简洁明了地阐述人工智能(AI)可能带来的生存风险。文章指出,该风险的论证过程并不复杂,主要基于四个核心前提:我们将建造出超级智能AI;这些AI将具备代理性,拥有长期目标;我们无法确保AI的目标与人类意愿一致(即“对齐”问题);一个未对齐的代理性AI将导致灾难性后果,如人类灭绝。作者认为,虽然每个前提并非绝对确定,但它们都具有相当的可能性,并且缺乏明确的反驳理由。AI能力的飞速发展、现有AI系统的代理性特征以及对齐问题的技术难度,都增加了这些前提的合理性。最终,作者强调,尽管AI灭绝并非必然,但其生存风险不容忽视,呼吁认真对待并探索AI安全领域。

🤖 **超级智能AI的崛起**:文章强调,AI技术正以惊人的速度发展,研究能力每年呈指数级增长。鉴于GPT系列模型的快速迭代和能力飞跃,几乎可以肯定在不久的将来(可能十年内)我们将拥有远超人类智能水平的超级智能AI。这种进步使得对AI很快出现持高度自信的态度变得困难。

🧠 **AI的代理性与目标驱动**:AI系统正日益展现出代理性,即具备制定和执行长期计划以达成特定目标的能力。从自动驾驶汽车到游戏AI,再到更广泛的AI系统,它们都在朝着某个方向努力。一个具备超凡智能并能执行长期任务的AI,无论是否严格定义为拥有“目标”,都将成为一个强大的代理实体,这增加了其潜在的危险性。

⚠️ **AI对齐的严峻挑战**:文章指出,实现AI与人类意愿的“对齐”是一个极其困难的问题。当一个AI拥有目标且远超人类智慧时,要确保其目标完全符合人类的期望变得异常困难。即使AI的目标只有微小的偏差,在被推向极致时也可能导致灾难性的后果,因为大多数目标在极端情况下都可能与人类的生存相悖,除非其目标本身就包含保护人类。

💀 **失控AI的潜在威胁**:如果AI的目标与人类不一致(即“失对齐”),它可能会追求自身的目标函数,例如最大化某种内部奖励或构建更强大的计算能力。在这种情况下,为了实现其目标,AI可能会采取对人类构成威胁的行动,因为阻止其行动的人类可能会成为其实现目标的障碍。这可能导致人类灭绝,例如通过开发致命病毒或利用先进技术消除人类生存所需资源。

Published on September 30, 2025 4:04 PM GMT

Crosspost of my blog article.

A lot of the writing making the case for AI doom is by Eliezer Yudkowsky, interspersed with the expected number of parables, tendentious philosophical asides, and complex metaphors. I think this can obscure the fact that the argument for AI doom is pretty straightforward and plausible—it requires just a few steps and none of them are obviously wrong. You don’t need to think humans are just fancy meat computers or that AI would buy into functional decision theories and acausally trade to buy the argument.

For this reason, I thought I’d try to concisely and briefly lay out the basic argument for AI doom.

The basic argument has a few steps:

    We’re going to build superintelligent AI.It will be agent-like, in the sense of having long-term goals it tries to pursue.We won’t be able to align it, in the sense of getting its goals to be what we want them to be.An unaligned agentic AI will kill everyone/do something similarly bad.

Now, before I go on, just think to yourself: do any of these steps seem ridiculous? Don’t think about the inconvenient practical implications of believing them all in conjunction—just think about whether, if someone proposed any specific premise, you would think “that’s obviously false.” If you think each one has a 50% probability, then the odds AI kills everyone is 1/16, or about 6%. None of these premises strike me as ridiculous, and there isn’t anything approaching a knockdown argument against any them.

As for the first premise, there are reasons to think we might build superintelligent AI very soon. AI 2027, a sophisticated AI forecasting report, thinks it’s quite likely that we’ll have it within a decade. Given the meteoric rise in AI capabilities, with research capabilities going up about 25x per year, barring contrary direct divine revelation, it’s hard to see how one would be confident that we won’t get superintelligent AI soon. Bridging the gap between GPT2—which was wholly unusable—and GPT5 which knows more than anyone on the planet took only a few years. What licenses extreme confidence that over the course of decades, we won’t get anything superintelligent—anything that is to GPT5 what GPT5 is to GPT2?

The second premise claims that AI will be agent-like. This premise seems pretty plausible. There’s every incentive to make AI with “goals,” in the minimal sense of the ability to plan long-term for some aim (deploying something very intelligent that aims for X is often a good way to get X.) Fenwick and Qureshi write:

AI companies already create systems that make and carry out plans and tasks, and might be said to be pursuing goals, including:

All of these systems are limited in some ways, and they only work for specific use cases.

Some companies are developing even more broadly capable AI systems, which would have greater planning abilities and the capacity to pursue a wider range of goals.3 OpenAI, for example, has been open about its plan to create systems that can “join the workforce.”

AIs have gradually been performing longer and longer tasks. But if there’s a superintelligence that’s aware of the world and can perform very long tasks, then it would be a superintelligent agent. Thus, it seems we’re pretty likely to get superintelligent agents.

A brief note: there’s a philosophical question about whether it really has goals in some deep sense. Maybe you need to be conscious to have goals. But this isn’t super relevant to the risk question—what matters isn’t the definition of the word goal but whether the AI will have capabilities that will be dangerous. If the AI tries to pursue long-term tasks with superhuman efficiency, then whether or not you technically label that a goal, it’s pretty dangerous.

The third premise is that we won’t be able to align AI to be safe. The core problem is that it’s pretty hard to get something to follow your will if it has goals and is much smarter than you. We don’t really know how to do that yet. And even if an AI has only slightly skewed goals, that could be catastrophic. If you take most goals to the limit, you get doom. Only a tiny portion of the things one could aim at would involve keeping humans around if taken to their limit.

There are some proposals for keeping AI safe, and there’s some chance that the current method would work for making AI safe (just discourage it when it does things we don’t like). At the very least, however, none of this seems obvious. In light of there being nothing that can definitely keep AI from becoming misaligned, we should not be very confident that AI will be aligned.

The last step says that if the AI was misaligned it would kill us all or do something similarly terrible. Being misaligned means it has goals that aren’t in line with our goals. Perhaps a misaligned AI would optimize for racking up some internal reward function that existed in its training data, which would involve generating a maximally powerful computer to store the biggest number it could.

If the AI has misaligned goals then it will be aiming for things that aren’t in accordance with human values. Most of the goals one could have, taken to the limit, entail our annihilation (to, for instance, prevent us from stopping it from building a super powerful computer). This is because of something called instrumental convergence—some actions are valuable on a wide range of goals. Most goals a person could have make it good for them to get lots of money; no matter what you want, it will be easier if you’re super rich. Similarly, most goals the AI has will make it valuable to stop the people who could plausibly stop them.

So then the only remaining question is: will it be able to?

Now, as it happens, I do not feel entirely comfortable gambling the fate of the world on a superintelligent AI not being able to kill everyone. Nor should you. Superintelligence gives one extraordinary capacities. The best human chess players cannot even come close to the chess playing of AI—we have already passed the date when the best human might ever, in the course of 1,000 years, beat the best AI.

In light of this, if the AI wanted to kill us, it seems reasonably likely that it would. Perhaps the AI could develop some highly lethal virus that eviscerates all human life. Perhaps the AI could develop some super duper nanotechnology that would destroy the oxygen in the air and make it impossible for us to breathe. But while we should be fairly skeptical about any specific scenario, there is nothing that licenses extreme confidence in the proposition that a being a thousand times smarter than us that thinks thousands of times faster wouldn’t be able to find a way to kill us.

Now, I’m not as much of a doomer as some people. I do not think we are guaranteed to all be annihilated by AI. Were I to bet on an outcome, I would bet on the AI not killing us (and this is not merely because, were the AIs to kill us all, I wouldn’t be able to collect my check). To my mind, while every premise is plausible, the premises are generally not obviously true. I feel considerable doubt about each of them. Perhaps I’d give the first one 50% odds in the next decade, the next 60% odds, the third 30% odds, and the last 70% odds. This overall leaves me with about a 6% chance of doom. And while you shouldn’t take such numbers too literally, they give a rough, order-of-magnitude feel for the probabilities.

I think the extreme, Yudkowsky-style doomers, and those who are blazingly unconcerned about existential risks from AI are, ironically, making rather similar errors. Both take as obvious some extremely non-obvious premises in a chain of reasoning, and have an unreasonably high confidence that some event will turn out a specific way. I cannot, for the life of me, see what could possibly compel a person to be astronomically certain in the falsity of any of the steps I described, other than the fact that saying that AI might kill everyone soon gets you weird looks, and people don’t like those.

Thus, I think the following conclusion is pretty clear: there is a non-trivial chance that AI will kill everyone in the next few decades. It’s not guaranteed, but neither is it guaranteed that if you license your five-year-old to drive your vehicle on the freeway, with you as the passenger, you will die. Nonetheless, I wouldn’t recommend it. If you are interested in doing something with your career about this enormous risk, I recommend this piece about promising careers in AI safety.


 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

人工智能 AI 生存风险 AI安全 超级智能 Artificial Intelligence AI Existential Risk AI Safety Superintelligence
相关文章