少点错误 09月29日
对《如果有人建造它,所有人都会死》的快速读后感
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文作者分享了对《如果有人建造它,所有人都会死》一书的初步看法。作者赞赏书中作者的坦诚,直面对AI的担忧和预测,并提出了政策建议。作者同意人类历史充满失误和不必要的风险,AI也不例外,不能仅凭良好意愿或市场激励就认为其安全。然而,作者认为书中对AI的看法过于“二元化”,将系统划分为“生长型”与“制造型”,以及将当前AI与超智能区分开来。作者更倾向于认为AI能力的发展是连续且渐进的,而非存在一个决定性的“觉醒”时刻。这种连续性的发展意味着人类有更多机会在AI能力提升的同时,加强安全措施和研究。

📚 作者赞赏《如果有人建造它,所有人都会死》一书的坦诚态度,认为其直接表达了作者对AI的担忧、预测和政策建议,避免了可能存在的战略欺骗,这使得讨论更加直接和有益。

💡 作者认同书中关于人类历史充斥着失误和不必要风险的观点,并将其类比于AI领域。作者强调,不能想当然地认为AI会因为我们有良好意愿或市场机制就会自动安全,AI安全需要主动努力,即使AI未达到“杀死所有人”的程度,其发展也可能带来灾难。

⚖️ 作者的主要分歧在于书中过于“二元化”的视角,例如将AI系统严格划分为“生长型”和“制造型”,以及区分当前AI与超智能。作者认为现实更为连续,AI能力的发展是一个渐进过程,而非存在一个清晰的“之前”与“之后”的界限,这为应对AI风险提供了更多机会窗口。

Published on September 28, 2025 5:34 PM GMT

I was hoping to write a full review of "If Anyone Builds It, Everyone Dies" (IABIED Yudkowski and Soares) but realized I won't have time to do it.  So here are my quick impressions/responses to IABIED. I am writing this rather quickly and it's not meant to cover all arguments in the book, nor to discuss all my views on AI alignment; see six thoughts on AI safety and Machines of Faithful Obedience for some of the latter.

First, I like that the book is very honest, both about the authors' fears and predictions, as well as their policy prescriptions. It is tempting to practice strategic deception, and even if you believe that AI will kill us all, avoid saying it and try to push other policy directions that directionally increase AI regulation under other pretenses. I appreciate that the authors are not doing that. As the authors say, if you are motivated by X but pushing policies under excuse Y, people will see through that.

I also enjoyed reading the book. Not all parables made sense, but overall the writing is clear. I agree with the authors that the history of humanity is full of missteps and unwarranted risks (e.g. their example of leaded fuel). There is no reason to think that AI would be magically safe on its own just because we have good intentions or that the market will incentivize that. We need to work on AI safety and, even if AI falls short of literally killing everyone, there are a number of ways in which its development could turn out bad for humanity or cause catastrophes that could have been averted.

At a high level, my main disagreement with the authors is that their viewpoint is very "binary" while I believe reality is much more continuous.  There are several manifestations of this "binary" viewpoint in the book. There is a hard distinction between "grown" and "crafted" systems, and there is a hard distinction between current AI and superintelligence. 

The authors repeatedly talk about how AI systems are grown, full of inscrutable numbers, and hence we have no knowledge how to align them. While they are not explicit about it, they implicit assumption is that there is a sharp threshold between non superintelligent AI and superintelligent AI. As they say "the greatest and most central difficulty in aligning artificial superintelligence is navigating the gap between before and after." Their story also has a discrete moment of "awakening" where "Sable" is tasked with solving some difficult math problems and develops its independent goals. Similarly when they discuss the approach of using AI to help with alignment research, they view it in binary terms: either the AI is too weak to help and may at best help a bit with interpretability, or AI is already "too smart, too dangerous, and would not be trustworthy."

I believe the line between "grown" vs "crafted" is much more blurry than the way the authors present it. First, there is a sense in which complex systems are also "grown". Consider for example, a system like Microsoft Windows with 10s of millions of lines of source code that has evolved over decades. We don't fully understand it either - which is why we still discover zero day vulnerabilities. This does not mean we cannot use Windows or shape it. Similarly, while AI systems are indeed "grown", they would not be used by hundreds of millions of users if AI developers did not have strong abilities to shape them into useful products. Yudkowski and Soares compare training AIs to "tricks .. like the sort of tricks a nutritionist might use to ensure a healthy brain development in a fetus during pregnancy." In reality model builders have much more control over their systems than even parents who raise and educate their kids over 18 years. ChatGPT might sometimes give the wrong answer, but it doesn't do the equivalent of becoming an artist when its parents wanted it to go to med school.

The idea that there would be a distinct "before" and "after" is also not supported by current evidence which has shown continuous (though exponential!) growth of capabilities over time. Based on our experience so far, the default expectation would be that AIs will grow in capabilities, ability for longterm planning and acting, in a continuous way. We also see that AI's skill profile is generally incomparable to humans. (For example it is typically not the case that an AI that achieves a certain score in a benchmark/exam X will perform in task Y similarly to humans that achieve the same score.) Hence there would not be a single moment where AI transitions from human level to superhuman level, but rather AIs will continue to improve, with different skills transitioning from human to superhuman levels at different time.

Continuous improvement means that as AIs become more powerful, our society of humans augmented with AIs is also more powerful, both in terms of defensive capabilities as well as research on controlling AIs. It also mean that we can extract useful lessons about both risks and mitigations from existing AIs, especially if we deploy them in the real world. In contrast, the binary point of view is anti empirical. One gets the impression that no empirical evidence for alignment advances would change the authors' view since it would all be evidence from the "before" times, which they don't believe will generalize to the "after" times. 

In particular, if we believe in continuous advances then we have more than one chance to get it right. AIs would not go from cheerful assistants to world destroyers in a heartbeat. We are likely to see many applications of AIs as well as (unfortunately) more accidents and harmful outcomes, way before they get to the combination of intelligence, misalignment, and unmonitored powers that leads to infecting everyone in the world with a virus that gives them "twelve different kinds of cancer" within a month.

Yudkowski and Soares talk in the book about various accidents in nuclear reactors and space ships, but they never mention all the cases that nuclear reactors actually worked and space ships returned safely. If they are right that there is one threshold which once passed, it's "game over" then this makes sense. In the book they make an analogy to a ladder where every time you climb it you get more rewards but once you reach the top rung then the ladder explodes and kills everyone. However, our experience so far with AI does not suggest that this is a correct world view.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 人工智能伦理 AI风险 If Anyone Builds It, Everyone Dies Yudkowski Soares AI alignment AI safety AI ethics AI risks
相关文章