AI安全社区的两大阵营与更优的行动方案

Published on October 24, 2025 8:18 AM GMT

Max Tegmark recently published a post "Which side of the AI safety community are you in?", where he carves the AI safety community into 2 camps:

Camp A) "Race to superintelligence safely”: People in this group typically argue that "superintelligence is inevitable because of X”, and it's therefore better that their in-group (their company or country) build it first. X is typically some combination of “Capitalism”, “Molloch”, “lack of regulation” and “China”.
Camp B) “Don’t race to superintelligence”: People in this group typically argue that “racing to superintelligence is bad because of Y”. Here Y is typically some combination of “uncontrollable”, “1984”, “disempowerment” and “extinction”.

I think this framing is counterproductive. Instead, here's the oversimplified framing that I prefer:

Plan 1: Try to get international coordination to stop the race to superintelligence, and then try to put humanity on a better trajectory so we can safely build aligned superintelligence later.^[1]

Plan 2: Try to make the transition to superintelligence go well even if we're locked in a race.

If someone thinks ASI will likely go catastrophically poorly if we develop it in something like current race dynamics, they are more likely to work on Plan 1.

If someone thinks we are likely to make ASI go well if we just put in a little safety effort, or thinks it's at least easier than getting strong international slowdown, they are more likely to work on Plan 2.

Then there are also some dynamics where people are more connected to people who have similar views as themselves, which leads to more clustering of opinions.

But in the end, those plans are not opposed to each other.

By default one may expect many Plan 1 people to be like: "Well yeah, it's good that some effort is also going into Plan 2 in case Plan 1 fails, but I don't have much hope in Plan 2." And Plan 2 people might be like: "Well yeah, Plan 1 would be even safer, but it seems very unrealistic to me, so I'm focusing on Plan 2."

And indeed, there may be many people with such views.

However, the amount of funding that goes into Plan 2 seems much much larger than the amount of funding that goes into Plan 1^[2], to the extent that investing into Plan 1 seems more impactful even for people who place more hope on Plan 2.

Even if you think Plan 1 is extremely unlikely to work, that doesn't make Plan 1 bad if it would work, and yet it seems to me that many people are reluctant to take a public stance that Plan 1 would be good, e.g. by signing a statement like the superintelligence statement.^[3]

I think at least part of the problem is that there are also some people, call them Type 3 people, who are motivated by status incentives and then rationalized that they need to get to ASI first because they will do it most safely. If you ask them to do something to support Plan 1, they will rationalize some reason not to do so.

People, who are earnestly motivated by helping humanity and ended up working on Plan 2, call them Type 2, are often in communities together with Type 3 people. So because of conformity and belief osmosis, those also become more reluctant to support Plan 1.

This is of course oversimplified, there's actually a spectrum between Type 2 (less muddled) and Type 3 (more muddled), and that's partly why there seems to be more like one Type 2+3 cluster, and then one rough "doomer" cluster.

I worry the "Which side are you on?" question will cause Type 2 people to cluster even more closely with Type 3 people instead of doomerish people, and that the question whether to support Plan A becomes polarized even though it seems to me like it usually makes sense to also support Plan A even given reasonable comparatively-optimistic beliefs. (I guess that may already happened to a significant extent anyway, but let's try to rather make it better instead of worse.)

And even if you think the other side is doing useless or worse than useless stuff, let's remember that we - doomers and Type 2 people - are all fighting for the same thing here and only have different beliefs, and that we all want as many people as possible to have as accurate beliefs as possible. Let us try to engage in productive rational discussion with people who disagree with us, so we may destroy falsehoods within ourselves and become better able to achieve our goals. Ok maybe that's too idealistic, and I guess many people think it's more effective to work on their plans instead of engaging with the other side, but then let's at least keep that ideal in the back of our minds and let us not polarize or be moved by outgroup bias.

Personally, I'm a doomer and I'm interested in doing some model syncing^[4] with Type 2 people over the next few weeks. So DM me if you're potentially interested, especially if you're a decently recognized researcher and are willing to spend multiple hours on this.^[5]

Also, if you do think that succeeding on Plan A would be net negative, I would be interested in why. (To be clear, the bar should be "internationally stopping the race seems worse in expectation than what happens by default without international coordination", not "it seems worse than internationally cooperating later when we have smarter AIs that could help with safety research", nor "it seems worse than the optimistic alternative case where we effectively burn the full lead of the US on safety"^[6], nor "a bad implementation of Plan A where we still get secret government races seems worse".)

^{^}
Yes, Plan 1 is underspecified here. It could be banning AI research but with an AI safety project exit plan, or without one. If you think some versions would be good and some worse than nothing, then simply taking a rough support/oppose position here is not for you.
^{^}
To be clear, if sth is AI policy that does not imply that it counts as Plan 1. There's a lot of AI policy effort towards policies that essentially don't help at all with xrisk. I'm not sure whether there is so little effective funding here because Openphil doesn't want Plan 1, or because they just felt intuitively uncomfortable with going past the overton window, or sth else.
^{^}
I sympathize with multiple reasons to not want to sign. I think in the end it usually makes more sense to sign anyway, but whatever.
^{^}
model syncing = Aiming to more fully understand each other's person world models and identifying cruxes. Not aimed to persuade.
^{^}
Info about me in case you're interested: I have relatively good knowledge of e.g. Paul Christiano's arguments from when I deeply studied that 3 years ago, but still not really a good understanding of how that adds up to so low probabilities of doom given that we're in a race where there won't be much slack for safety. I'm most interested in better understanding that part, but I'm also open to try to share my models for people who are interested. I'm totally willing to study more of the most important readings for better understanding the optimists side. I mostly did non-prosaic AI alignment research for the last 3.5 years, although I'm now pivoting to working towards Plan 1.
^{^}
although that one already seems like a hard sell to me and I'd be fine with discussing that too.

Discuss

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签