少点错误 21小时前
AI安全社区的两大阵营与更优的行动方案
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Max Tegmark提出的AI安全社区两派划分——“安全地竞赛至超智能”与“不竞赛至超智能”——被作者认为有失偏颇。作者提出两种更优的行动方案:一是寻求国际协调,停止超智能竞赛,为安全地构建对齐的超智能创造条件;二是即使处于竞赛中,也要确保向超智能的过渡过程顺利。文章指出,尽管两种方案并非对立,但目前投入于第二种方案的资源远超第一种。作者也探讨了驱动这两种观点的动机差异,并担忧“阵营划分”可能加剧观点极化,呼吁社区成员保持理性讨论,共同致力于AI安全。

📊 **AI安全社区的现有划分及其局限性:** 文章指出,Max Tegmark将AI安全社区划分为“安全地竞赛至超智能”(Camp A)和“不竞赛至超智能”(Camp B)两派,前者认为超智能不可避免,应由己方率先构建;后者则认为竞赛本身存在风险,主张避免。作者认为这种二元划分方式存在局限性,可能不利于问题的解决。

💡 **提出更优的行动方案:** 作者提出两种更务实的行动方案:Plan 1,即寻求国际合作,停止AI竞赛,为安全构建超智能争取时间并优化人类发展轨迹;Plan 2,即在竞赛不可避免的情况下,努力确保向超智能的过渡过程是安全的。这两种方案并非相互排斥,而是可以并行推进的。

💰 **资源投入的失衡与潜在动机:** 文章观察到,当前用于Plan 2(在竞赛中寻求安全)的资源远超Plan 1(停止竞赛)。作者分析了不同类型参与者的动机,区分了真正致力于人类福祉的“Type 2”与可能受地位驱动的“Type 3”,并担忧后者可能阻碍对Plan 1的支持。这种动机差异和资源分配可能导致观点极化。

🤝 **呼吁理性沟通与合作:** 尽管存在分歧,作者强调AI安全领域的各方都在为共同目标努力。文章呼吁社区成员避免因阵营划分而产生的极化和群体偏见,鼓励进行富有成效的理性讨论,以互相理解,修正认知,共同推进AI安全的研究和实践。作者本人也表示愿意与持不同观点者进行“模型同步”(model syncing)以增进理解。

❓ **对“停止竞赛”的深入探讨:** 作者进一步阐述了对Plan 1(停止竞赛)的支持理由,并对一些反对者提出了质疑。他认为,即使Plan 1的成功概率不高,其潜在收益也值得追求。同时,他也对一些人因为担忧Plan 1的某种糟糕实现而全盘否定其价值的观点表示不解,并愿意深入探讨反对Plan 1的具体原因,只要其论证标准是“国际停止竞赛在预期上比默认情况更差”。

Published on October 24, 2025 8:18 AM GMT

Max Tegmark recently published a post "Which side of the AI safety community are you in?", where he carves the AI safety community into 2 camps:

Camp A) "Race to superintelligence safely”: People in this group typically argue that "superintelligence is inevitable because of X”, and it's therefore better that their in-group (their company or country) build it first. X is typically some combination of “Capitalism”, “Molloch”, “lack of regulation” and “China”.

Camp B) “Don’t race to superintelligence”: People in this group typically argue that “racing to superintelligence is bad because of Y”. Here Y is typically some combination of “uncontrollable”, “1984”, “disempowerment” and “extinction”.

I think this framing is counterproductive. Instead, here's the oversimplified framing that I prefer:

Plan 1: Try to get international coordination to stop the race to superintelligence, and then try to put humanity on a better trajectory so we can safely build aligned superintelligence later.[1]

Plan 2: Try to make the transition to superintelligence go well even if we're locked in a race.

If someone thinks ASI will likely go catastrophically poorly if we develop it in something like current race dynamics, they are more likely to work on Plan 1.

If someone thinks we are likely to make ASI go well if we just put in a little safety effort, or thinks it's at least easier than getting strong international slowdown, they are more likely to work on Plan 2.

Then there are also some dynamics where people are more connected to people who have similar views as themselves, which leads to more clustering of opinions.

But in the end, those plans are not opposed to each other.

By default one may expect many Plan 1 people to be like: "Well yeah, it's good that some effort is also going into Plan 2 in case Plan 1 fails, but I don't have much hope in Plan 2." And Plan 2 people might be like: "Well yeah, Plan 1 would be even safer, but it seems very unrealistic to me, so I'm focusing on Plan 2."

And indeed, there may be many people with such views.

However, the amount of funding that goes into Plan 2 seems much much larger than the amount of funding that goes into Plan 1[2], to the extent that investing into Plan 1 seems more impactful even for people who place more hope on Plan 2

Even if you think Plan 1 is extremely unlikely to work, that doesn't make Plan 1 bad if it would work, and yet it seems to me that many people are reluctant to take a public stance that Plan 1 would be good, e.g. by signing a statement like the superintelligence statement.[3]

I think at least part of the problem is that there are also some people, call them Type 3 people, who are motivated by status incentives and then rationalized that they need to get to ASI first because they will do it most safely. If you ask them to do something to support Plan 1, they will rationalize some reason not to do so.

People, who are earnestly motivated by helping humanity and ended up working on Plan 2, call them Type 2, are often in communities together with Type 3 people.  So because of conformity and belief osmosis, those also become more reluctant to support Plan 1.

This is of course oversimplified, there's actually a spectrum between Type 2 (less muddled) and Type 3 (more muddled), and that's partly why there seems to be more like one Type 2+3 cluster, and then one rough "doomer" cluster.

I worry the "Which side are you on?" question will cause Type 2 people to cluster even more closely with Type 3 people instead of doomerish people, and that the question whether to support Plan A becomes polarized even though it seems to me like it usually makes sense to also support Plan A even given reasonable comparatively-optimistic beliefs. (I guess that may already happened to a significant extent anyway, but let's try to rather make it better instead of worse.)

And even if you think the other side is doing useless or worse than useless stuff, let's remember that we - doomers and Type 2 people - are all fighting for the same thing here and only have different beliefs, and that we all want as many people as possible to have as accurate beliefs as possible. Let us try to engage in productive rational discussion with people who disagree with us, so we may destroy falsehoods within ourselves and become better able to achieve our goals. Ok maybe that's too idealistic, and I guess many people think it's more effective to work on their plans instead of engaging with the other side, but then let's at least keep that ideal in the back of our minds and let us not polarize or be moved by outgroup bias.

Personally, I'm a doomer and I'm interested in doing some model syncing[4] with Type 2 people over the next few weeks. So DM me if you're potentially interested, especially if you're a decently recognized researcher and are willing to spend multiple hours on this.[5]

Also, if you do think that succeeding on Plan A would be net negative, I would be interested in why. (To be clear, the bar should be "internationally stopping the race seems worse in expectation than what happens by default without international coordination", not "it seems worse than internationally cooperating later when we have smarter AIs that could help with safety research", nor "it seems worse than the optimistic alternative case where we effectively burn the full lead of the US on safety"[6], nor "a bad implementation of Plan A where we still get secret government races seems worse".)

  1. ^

    Yes, Plan 1 is underspecified here. It could be banning AI research but with an AI safety project exit plan, or without one. If you think some versions would be good and some worse than nothing, then simply taking a rough support/oppose position here is not for you.

  2. ^

    To be clear, if sth is AI policy that does not imply that it counts as Plan 1. There's a lot of AI policy effort towards policies that essentially don't help at all with xrisk. I'm not sure whether there is so little effective funding here because Openphil doesn't want Plan 1, or because they just felt intuitively uncomfortable with going past the overton window, or sth else.

  3. ^

    I sympathize with multiple reasons to not want to sign. I think in the end it usually makes more sense to sign anyway, but whatever.

  4. ^

    model syncing = Aiming to more fully understand each other's person world models and identifying cruxes. Not aimed to persuade.

  5. ^

    Info about me in case you're interested: I have relatively good knowledge of e.g. Paul Christiano's arguments from when I deeply studied that 3 years ago, but still not really a good understanding of how that adds up to so low probabilities of doom given that we're in a race where there won't be much slack for safety. I'm most interested in better understanding that part, but I'm also open to try to share my models for people who are interested. I'm totally willing to study more of the most important readings for better understanding the optimists side. I mostly did non-prosaic AI alignment research for the last 3.5 years, although I'm now pivoting to working towards Plan 1.

  6. ^

    although that one already seems like a hard sell to me and I'd be fine with discussing that too.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI安全 超智能 AI竞赛 国际协调 Max Tegmark AI Safety Superintelligence AI Race International Coordination
相关文章