少点错误 09月24日
关于AI发展连续性与对齐问题的讨论
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近期,关于人工智能(AI)发展是渐进式还是存在“断崖式”飞跃的争论引发关注。许多研究者认为,即使AI能力逐步提升,从弱AI到强AI(AGI)乃至超智能(ASI)的过程中,也可能存在关键性的“阶段性变化”,导致现有对齐经验失效。文章引用了Buck Shirgellis、Will McAskill、Scott Alexander和Clara Collier等人的观点,探讨了AI能力渐进提升的可能性、对齐问题在不同发展阶段的重要性,以及当前研究与未来超智能之间的连续性问题。核心在于,是否能从当前AI研究中学习到足以应对未来超智能对齐挑战的经验,以及对AI发展路径的预判如何影响对齐策略。

🧐 **AI发展中的“阶段性变化”论点**:尽管普遍认为AI能力会逐步增强,但有观点指出,从弱AI到强AI(AGI)乃至超智能(ASI)的过程中,可能存在一个关键的“断崖式”或“阶段性变化”,使得在AI能力不足时学到的经验无法直接应用于更高级的AI。这种变化可能是由于AI能够自我改进,或者其智能水平达到一个质变的门槛。

🚀 **渐进式发展与对齐学习的重要性**:部分研究者(如Will McAskill)认为,AI能力提升更可能是渐进的,而非一夜之间的“觉醒”。在这种情况下,人类有机会在AI能力不断增强的过程中,利用现有AI劳动来辅助下一代模型的对齐工作,并更好地理解未来超智能系统的行为模式。这意味着,通过早期和持续的实验与学习,可以积累应对超智能对齐挑战的宝贵经验,而非“一次性成功”的压力。

🤔 **当前研究与未来超智能的连续性争议**:关于当前AI研究对未来超智能对齐的价值,存在争议。一些人认为,现有AI与未来超智能之间存在根本性差异,导致当前工作意义不大。而另一些人则相信AI发展具有足够的连续性,使得现在的研究和实验能够提供有益的洞察和方法,为应对未来的挑战奠定基础。这种分歧根植于对AI能力如何发展以及控制其所需条件的具体经验判断。

🔑 **对齐问题的“初次尝试即成功”困境**:文章探讨了“初次尝试即成功”(first try or die)的对齐困境。如果人类只能在第一次机会对齐超智能时成功,那么在没有机会从错误中学习的情况下,成功的可能性极低。这取决于对AI能力发展路径的根本性信念:如果AI发展是缓慢且连续的,那么将有更多机会进行学习和调整。

Published on September 23, 2025 7:30 PM GMT

A number of reviewers have noticed the same problem IABIED: an assumption that lessons learnt in AGI cannot be applied to ASI -- that there is a "discontinuity or "phase change" -- even under the assumption of gradualism. The only explanation so far is Eliezer's "Dragon story" ... but I find it makes the same assumptions, and Buck seems to find it unsatisfactory , too. Quotes below.


Buck Shirgellis: ""I’m not trying to talk about what will happen in the future, I’m trying to talk about what would happen if everything happened gradually, like in your dragon story!

You argued that we’d have huge problems even if things progress arbitrarily gradually, because there’s a crucial phase change between the problems that occur when the AIs can’t take over and the problems that occur when they can. To assess that, we need to talk about what would happen if things did progress gradually. So it’s relevant whether wacky phenomena would’ve been observed on weaker models if we’d looked harder; IIUC your thesis is that there are crucial phenomena that wouldn’t have been observed on weaker models.

In general, my interlocutors here seem to constantly vacillate between “X is true” and “Even if AI capabilities increased gradually, X would be true”. I have mostly been trying to talk about the latter in all the comments under the dragon metaphor."

Will McAskill: "Sudden, sharp, large leaps in intelligence now look unlikely. Things might go very fast: we might well go from AI that can automate AI R&D to true superintelligence in months or years (see Davidson and Houlden, “How quick and big would a software intelligence explosion be?"). But this is still much slowerthan, for example, the “days or seconds” that EY entertained in “Intelligence Explosion Microeconomics”. And I don’t see any good arguments for expecting highly discontinuous progress, rather than models getting progressively and iteratively better.

In Part I of IABIED, it feels like one moment we’re talking about current models, the next we’re talking about strong superintelligence. We skip over what I see as the crucial period, where we move from the human-ish range to strong superintelligence[1]. This is crucial because it’s both the period where we can harness potentially vast quantities of AI labour to help us with the alignment of the next generation of models, and because it’s the point at which we’ll get a much better insight into what the first superintelligent systems will be like. The right picture to have is not “can humans align strong superintelligence”, it’s “can humans align or control AGI-”, then “can {humans and AGI-} align or control AGI” then “can {humans and AGI- and AGI} align AGI+” and so on.

Elsewhere, EY argues that the discontinuity question doesn’t matter, because preventing AI takeover is still a ‘first try or die’ dynamic, so having a gradual ramp-up to superintelligence is of little or no value. I think that’s misguided. Paul Christiano puts it well: “Eliezer often equivocates between “you have to get alignment right on the first ‘critical’ try” and “you can’t learn anything about alignment from experimentation and failures before the critical try.” This distinction is very important, and I agree with the former but disagree with the latter.”

Scott Alexander: "But I think they really do imagine something where a single AI “wakes up” and goes from zero to scary too fast for anyone to notice. I don’t really understand why they think this, I’ve argued with them about it before, and the best I can do as a reviewer is to point to their Sharp Left Turn essay and the associated commentary and and see whether my readers understand it better than I do. "

Clara Collier: "Humanity only gets one shot at the real test." That is, we will have one opportunity to align our superintelligence. That's why we'll fail. It's almost impossible to succeed at a difficult technical challenge when we have no opportunity to learn from our mistakes. But this rests on another implicit claim: Currently existing AIs are so dissimilar to the thing on the other side of FOOM that any work we do now is irrelevant.

Most people working on this problem today think that AIs will get smarter, but still retain enough fundamental continuity with existing systems that we can do useful work now, while taking on an acceptably low risk of disaster. That's why they bother. Yudkowsky and Soares dismiss these (relative) optimists by stating that "these are not what engineers sound like when they respect the problem, when they know exactly what they're doing. These are what the alchemists of old sounded like when they were proclaiming their grand philosophical principles about how to turn lead into gold."1 I would argue that the disagreement here has less to do with fundamental respect for the problem than specific empirical beliefs about how AI capabilities will progress and what it will take to control them. If one believes that AI progress will be slow and continuous, or even relatively fast and continuous, it follows that we’ll have more than one shot at the goal".



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI对齐 AGI ASI 人工智能发展 渐进式发展 阶段性变化 AI安全 AI alignment AGI ASI AI development gradualism phase change AI safety
相关文章