少点错误 09月29日
AI生存指南:警惕超智能的潜在风险
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文书评聚焦于埃利泽·尤德科夫斯基和内特·索尔斯的著作《如果有人建造它,所有人都会死》。书中深入探讨了超智能人工智能可能带来的生存性风险,并呼吁公众对AI的快速发展保持警惕。作者强调,尽管AI领域的专家们在具体风险程度和时间线上存在分歧,但AI最终超越人类智能的可能性以及由此带来的潜在危险是真实存在的。文章分析了AI公司领导者对风险的看法,并探讨了包括暂停AI发展在内的潜在政策应对措施,同时指出当前政策执行的挑战性。作者鼓励读者在评估AI风险时,要超越对技术过度炒作的惯性思维,认真对待来自AI领域先驱的警告,并保持开放的心态,探索多样化的安全策略。

📚 **AI的潜在生存性风险不容忽视**: 作者强调,即使对AI发展持有不同观点的专家,也普遍承认AI可能超越人类智能并带来严峻挑战。《如果有人建造它,所有人都会死》一书虽然在某些安全策略的有效性上可能显得过于自信,但其关于AI可能带来的巨大风险的警告应引起高度重视。',

⏳ **AI发展速度与“Soon”的紧迫性**: 文章作者认为,AI的进步速度可能比许多人预期的要快,并引用图表和非正式估计表明AI可能在2030年代初超越人类能力。尽管书中对AI何时能发展到足以构成威胁的时间点上保持谨慎,但作者本人认为AI的快速发展是“正在发生的重要事件”,需要认真对待。

🤔 **AI公司领导者的复杂心态与政策挑战**: 尽管AI公司领导者普遍承认AI的潜在风险,但他们往往不接受“风险过于遥远”的观点,并将其视为开发失败的证据。他们承认在追求乌托邦的过程中,其策略如同“玩俄罗斯轮盘赌”。文章探讨了暂停AI发展的政策选项,同时也指出了执行上的困难,例如软件创新的难以限制以及监管的有效性问题。

⚖️ **平衡创新与安全,探索多元化对策**: 作者对全面暂停AI发展的必要性持保留态度,认为可以考虑在2027年左右进行评估,届时AI的发展可能提供更好的决策依据。文章提出,更现实的策略可能是区分AI的“代理性”(agentiness)和“预测能力”(prediction ability),限制前者而发展后者,并强调需要更精细化的监管手段,而非一刀切的禁令,以在确保安全的同时保留部分有益的AI能力进步。

Published on September 28, 2025 9:36 PM GMT

Book review: If Anyone Builds It, Everyone Dies: Why Superhuman AI Would
Kill Us All, by Eliezer Yudkowsky, and Nate Soares.

[This review is written (more than my usual posts) with a Goodreads
audience in mind. I will write a more LessWrong-oriented post with a
more detailed description of the ways in which the book looks
overconfident.]

If you're not at least mildly worried about AI, Part 1 of this book is
essential reading.

Please read If Anyone Builds It, Everyone Dies (IABIED) with Clarke's
First Law in mind ("When a distinguished but elderly scientist states
that something is possible, he is almost certainly right. When he states
that something is impossible, he is very probably wrong."). The authors
are overconfident in dismissing certain safety strategies. But their
warnings about what is possible ought to worry us.

I encourage you to (partly) judge the book by its cover: dark,
implausibly certain of doom, and endorsed by a surprising set of
national security professionals who had previously been very quiet about
this topic. But only one Nobel Prize winner.

### Will AI Be Powerful Soon?

The first part of IABIED focuses on what seems to be the most widespread
source of disagreement: will AI soon become powerful enough to conquer
us?

There are no clear obstacles to AIs becoming broadly capable of
outsmarting us.

AI developers only know how to instill values that roughly approximate
the values that they intend to instill.

Maybe the AIs will keep us as pets for a while, but they'll have
significant abilities to design entities that better satisfy what the
AIs want from their pets. So unless we train the AIs such that we're
their perfect match for a pet, they may discard us for better models.

For much of Part 1, IABIED is taking dangers that experts mostly agree
are real, and concluding that the dangers are much worse than most
experts believe. IABIED's arguments seem relatively weak when they're
most strongly disagreeing with more mainstream experts. But the book's
value doesn't depend very much on the correctness of those weaker
arguments, since merely reporting the beliefs of experts at AI companies
would be enough for the book to qualify as alarmist.

I'm pretty sure that over half the reason why people are skeptical of
claims such as IABIED makes is that people expect technology to be
consistently overhyped.

It's pretty understandable that a person who has not focused much
attention on AI assumes it will work out like a typical technology.

An important lesson for becoming a superforecaster is to start from the
assumption that nothing ever
happens
.
I.e. that the future will mostly be like the past, and that a large
fraction of claims that excite the news media turn out not to matter for
forecasting, yet the media are trying to get your attention by
persuading you that they do matter.

The heuristic that nothing ever happens has improved my ability to make
money off the stock market, but the exceptions to that heuristic are
still painful.

The most obvious example is COVID. I was led into complacency by a
century of pandemics that caused less harm to the US than alarmists had
led us to expect.

Another example involves hurricane warnings. The news media exaggerate
the
dangers

of typical storms enough that when a storm such as Katrina comes along,
viewers and newscasters alike find it hard to take accurate predictions
seriously.

So while you should start with a pretty strong presumption that
apocalyptic warnings are hype, it's important to be able to change your
mind about them.

What evidence is there that AI is exceptional enough that you should
evaluate it carefully?

The most easy to understand piece of news is that Geoffrey Hinton, who
won a Nobel Prize for helping AI get where it is today, worries that his
life work was a mistake.

There's lots of other evidence. IABIED points to many ways in which AI
has exceeded human abilities as fairly good evidence of what might be
possible for AI. Alas, there's no simple analysis that tells us what's
likely.

If I were just starting to learn about AI, I'd feel pretty confused as
to how urgent the topic is. But I've been following it for a long time.
E.g. I wrote my master's thesis in 1993 on neural nets, correctly
predicting that they would form the foundation for AI. So you should
consider my advice on this topic to be better than random. I'm telling
you that something very important is happening.

### How Soon?

I'm concerned that IABIED isn't forceful enough about the "soon"
part.

I've been convinced that AI will soon be powerful by a wide variety of
measures of AI progress (e.g. these
graphs
, but also my
informal estimates of how wide a variety of tasks it can handle). There
are many trend lines that suggest AI will surpass humans in the early
2030s.

Others have tried the general approach of using such graphs to convince
people, with unclear results. But this is one area where IABIED
carefully avoids overconfidence.

Part 2 describes a detailed, somewhat plausible scenario of how an AI
might defeat humanity. This part of the book shouldn't be important,
but probably some readers will get there and be surprised to realize
that the authors really meant it when they said that AI will be
powerful.

A few details of the scenario sound implausible. I agree with the basic
idea that it would be unusually hard to defend against an AI attack. Yet
it seems hard to describe a really convincing scenario.

A more realistic scenario would likely sound a good deal more mundane.
I'd expect persuasion, blackmail, getting control of drone swarms, and
a few other things like that. The ASI would combine them in ways that
rely on evidence which is too complex to fit in a human mind. Including
it in the book would have been futile, because skeptics wouldn't come
close to understanding why the strategy would work.

### AI Company Beliefs

What parts of this book do leaders of AI companies disagree with? I'm
fairly sure that they mostly agree that Part 1 of IABIED points to real
risks. Yet they mostly reject the conclusion of the book's title.

Eight years ago I wrote some
speculations

on roughly this topic. The main point that has changed since then is
that believing "the risks are too distant" has become evidence that
the researcher is working on a failed approach to AI.

This time I'll focus mainly on the leaders of the four or so labs that
have produced important AIs. They all seem to have admitted at some
point that their strategies are a lot like playing Russian Roulette, for
a decent shot at creating utopia.

What kind of person is able to become such a leader? It clearly requires
both unusual competence and some recklessness.

I feel fairly confused as to whether they'll become more cautious as
their AIs become more powerful. I see a modest chance that they are
accurately predicting which of their AIs will be too weak to cause a
catastrophe, and that they will pivot before it's too late. The stated
plans of AI companies are not at all reassuring. Yet they likely
understand the risks better than does anyone who might end up regulating
AI.

### Policies

I want to prepare for a possible shutdown of AI development circa 2027.
That's when my estimate of its political feasibility gets up to about
30%.

I don't want a definite decision on a shutdown right now. I expect that
AIs of 2027 will give us better advice than we have today as to whether
a shutdown is wise, and how
draconian

it needs to be. (IABIED would likely claim that we can't trust those
AIs. That seems to reflect an important disagreement about how AI will
work as it approaches human levels.)

Advantages of waiting a bit:

-   better AIs to help enforce the shutdown; in particular, better
   ability to reliably evaluate whether something violates the shutdown
-   better AIs to help decide how long the shutdown needs to last

I think I'm a bit more optimistic than IABIED about AI companies'
ability to judge whether their next version will be dangerously
powerful.

I'm nervous about labeling IABIED's proposal as a shutdown, when
current enforcement abilities are rather questionable. It seems easier
for AI research to evade restrictions than is the case with nuclear
weapons. Developers who evade the law are likely to take less thoughtful
risks than what we're currently on track for.

I'm hoping that with AI support in 2027 it will be possible to regulate
the most dangerous aspects of AI progress, while leaving some capability
progress intact. Such as restricting research that increases AI
agentiness, but not research that advances prediction ability. I see
current trends as on track to produce superhuman predictions before it
reaches superhuman steering abilities. AI companies could do more if
they wanted to to increase the differences between those two categories
(see Drexler's
CAIS

for hints). And most of what we need for safety is superhuman
predictions of which strategies have which risks (IABIED clearly
disagrees with that claim).

IABIED thinks that the regulations they propose would delay ASI by
decades. I'm unclear how confident they are about that prediction. It
seems important to have doubts about how much of a delay is feasible.

A key component of their plan involves outlawing some AI research
publications. That is a tricky topic, and their strategy is less clearly
explained than I had hoped.

I'm reminded of a time in the late 20th century, when cryptography was
regulated in a way that led to t-shirts describing the RSA algorithm
being classified as a
munition

that could not be exported. Needless to say, that regulation was not
very effective. This helps illustrate why restricting software
innovation is harder than a casual reader would expect.

IABIED wants to outlaw the publication of papers such as the famous
Attention Is All You
Need
paper
that introduced the transformer algorithm. But that leaves me confused
as to how broad a ban they hope for.

Possibly none of the ideas that need to be banned are quite simple
enough to be readily described on a t-shirt, but I'm hesitant to bet on
that. I will bet that would be hard for a regulator to recognize as
relevant to AI. Matrix multiplication improvements are an example of a
borderline case.

Low-level optimizations such as that could significantly influence how
much compute is needed to create a dangerous AI.

In addition, smaller innovations, especially those that just became
important recently, are somewhat likely to be reinvented by multiple
people. So I expect that there are a nontrivial set of advances for
which a ban on publication would delay progress for less than a year.

In sum, a decades-long shutdown might require more drastic measures than
IABIED indicates.

The restriction on GPU access also needs some clarification. It's
currently fairly easy to figure out which chips matter. But with
draconian long-term restrictions on anything that's classified as a
GPU, someone is likely to get creative about building powerful chips
that don't fit the GPU classification. It doesn't seem too urgent to
solve this problem, but it's important not to forget it.

IABIED often sounds like its saying that a long shutdown is our only
hope. I doubt they'd explicitly endorse that claim. But I can imagine
that the book will nudge readers into that conclusion.

I'm more optimistic than IABIED about other strategies. I don't expect
we'll need a genius to propose good solutions. I'm fairly convinced
that the hardest part is distinguishing good, but still risky, solutions
from bad ones when we see them.

There are more ideas than I have time to evaluate for making AI
development safer. Don't let IABIED talk you into giving up on all of
them.

### Conclusion

Will IABIED be good enough to save us? It doesn't seem persuasive
enough to directly change the minds of a large fraction of voters. But
it's apparently good enough that important national security people
have treated it as a reason to go public with their concerns. IABIED may
prove to be highly valuable by persuading a large set of people that
they can express their existing concerns without being branded as weird.

We are not living in normal times. Ask your favorite AI what AI company
leaders think of the book's arguments. Look at relevant prediction
markets, e.g.:

-   Will we get ASI before
   2034?

-   Does an AI disaster kill at least 100 people before 2040?
-   Will there be a highly risky or catastrophic AI agent proliferation
   event before
   2035?

-   At the beginning of 2035, will Eliezer Yudkowsky still believe that
   AI doom is coming soon with high
   probability?

 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI风险 人工智能 超智能 生存风险 AI安全 AI伦理 AI发展 AI政策 AI公司 AI Book Review Artificial Intelligence Superintelligence Existential Risk AI Safety AI Ethics AI Development AI Policy AI Companies
相关文章