少点错误 10月29日 03:41
关注多阶段的莫特-贝利策略及其在人工智能讨论中的应用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了多阶段莫特-贝利策略在人工智能安全讨论,特别是针对“一旦出现与当前技术类似的通用人工智能,地球上的每个人都将死亡”的论断以及随之而来的监管呼吁中可能产生的误解和担忧。作者指出,初始的谨慎观点(莫特)可能被解读为激进的政策主张(贝利),例如建立垄断性的全球治理机构。文章还引用了特权和知识自由等概念的讨论,说明核心观点在传播过程中可能被极端化。作者呼吁在讨论复杂且具有政治敏感性的话题时保持警惕,避免无意中加剧分歧,并建议在LessWrong等社区内明确政治前提,以促进更严谨和负责任的讨论。

⚠️ 莫特-贝利策略风险:文章核心关注在人工智能安全讨论中出现的“莫特-贝利”现象,即先提出一个容易 الدفاع的观点(莫特),然后逐渐滑向一个更具争议但更吸引人的观点(贝利),尤其是在多人解读和传播过程中,最初的限定条件和缓和语气容易丢失,导致观点被极端化。

🗣️ 担忧的焦点:作者特别提到了对“一旦出现与当前技术类似的通用人工智能,地球上的每个人都将死亡”这一观点的担忧,认为这可能被一些人利用,最终导向建立一个压制技术进步且效率低下的庞大监管体系,或者如Dean W. Ball所言,成为推动全球治理机构垄断技术的序幕。

🔄 观点演变与政治化:文章还通过“特权”和“知识自由”的例子,阐述了核心概念如何在传播过程中被政治化。例如,“特权”的最初含义是指出社会结构对某些群体的系统性有利,但可能演变为指责特权者并要求其承担代价;同样,“知识自由”旨在反对思想禁锢,但也可能被解读为支持某些可能引发社会争议的观点。

🛡️ LessWrong的应对:针对这些问题,作者肯定了LessWrong社区要求成员理解和遵守“政治先决条件”的做法,这有助于促进更细致和去政治化的讨论。这些规则旨在避免不必要的政治性示例,并鼓励辩论双方避免将政策辩论视为一边倒的问题。

🤔 未竟的挑战:尽管如此,作者承认,当讨论的主题本身就带有政治性,并且其后果会超出LessWrong社区的范围时,如何有效应对多阶段莫特-贝利策略仍然是一个挑战。文章的最后,作者坦诚自己也难以找到完美的解决方案,但强调了警惕这种现象并努力对抗其负面影响的重要性。

Published on October 28, 2025 5:50 PM GMT

This post kinda necessarily needs to touch multiple political topics at once. Please, everyone, be careful. If it looks like you haven't read the LessWrong Political Prerequisites, I'm more likely than usual to delete your comments.


I think some people are (rightly) worried about a few flavors of Motte and Baily-ing with the IABIED discourse, and more recently, with the Superintelligence Statement. 

With IABIED:

"Sure, the motte is 'If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die.'". But I feel like I'm being set up for a bailies like:

"and... Eliezer's frame on how to think about the problem is exactly right" [1]

Or, longer term, not necessarily from the MIRI folk:

"Let's build a giant regulatory apparatus that not only stifles technological progress but also is too opaque and bureaucratic to even really solve the problems it's purportedly supposed to solve."

I started writing this post during the IABIED discourse, and didn't get around to publishing it before the discourse mostly moved on and it felt slightly dated. But, today, I read Dean W. Ball's tweet about the Superintelligence Statement:

right, but since you and I both know that this statement is a prelude to “and we should have global governance institutions hold the exclusive monopoly over the technology,” and definitely there is not consensus about that, it is “counterproductive,” as I put it, to create a false appearance of consensus toward what is in fact a radical and dangerous policy objective

These seem like totally reasonable concerns to me. Seems good to talk about.

Problem: Multi-Stage Motte and Bailey

Hedge drift and advanced motte-and-bailey is a pretty good reference for some of the concerns:

Motte and bailey is a technique by which one protects an interesting but hard-to-defend view by making it similar to a less interesting but more defensible position. Whenever the more interesting position - the bailey - is attacked - one retreats to the more defensible one - the motte -, but when the attackers are gone, one expands again to the bailey. 

In that case, one and the same person switches between two interpretations of the original claim. Here, I rather want to focus on situations where different people make different interpretations of the original claim. The originator of the claim adds a number of caveats and hedges to their claim, which makes it more defensible, but less striking and sometimes also less interesting.* When others refer to the same claim, the caveats and hedges gradually disappear, however, making it more and more bailey-like.

A salient example of this is that scientific claims (particularly in messy fields like psychology and economics) often come with a number of caveats and hedges, which tend to get lost when re-told. This is especially so when media writes about these claims, but even other scientists often fail to properly transmit all the hedges and caveats that come with them.

Since this happens over and over again, people probably do expect their hedges to drift to some extent. Indeed, it would not surprise me if some people actually want hedge drift to occur. Such a strategy effectively amounts to a more effective, because less observable, version of the motte-and-bailey-strategy. Rather than switching back and forth between the motte and the bailey - something which is at least moderately observable, and also usually relies on some amount of vagueness, which is undesirable - you let others spread the bailey version of your claim, whilst you sit safe in the motte. This way, you get what you want - the spread of the bailey version - in a much safer way.

Even when people don't use this strategy intentionally, you could argue that they should expect hedge drift, and that omitting to take action against it is, if not outright intellectually dishonest, then at least approaching that. This argument would rest on the consequentialist notion that if you have strong reasons to believe that some negative event will occur, and you could prevent it from happening by fairly simple means, then you have an obligation to do so. I certainly do think that scientists should do more to prevent their views from being garbled via hedge drift. 

I think, on a good day, MIRI et al are pretty careful with their phrasings. But, not every day is a good day. Folk get rushed or triggered.

Also, there is a whole swath of doomer-ish people who aren't being really remotely careful most of the time. And, on top of that, there is some kind of emergent toxoplasmic egregore, built of the ecosystem including thoughtful doomers, thoughtful optimistics, unthoughtful doomers, unthoughtful optimists, and twitter people hanging out nearby who don't even really have a particular position on the topic.

I don't actually have a specific ask for anyone. But, I know I myself am not sufficiently careful much of the time that I'm arguing on the internet, and I endorse people reminding me of this when it seems like I'm contributing to the sort of bad dynamics in this post.

Some background examples

EA, "Out to Get You", and "Giving Your All"

Quite awhile ago, Zvi wrote Out to Get You, noting systems and memeplexes that don't have your best interests and want to exploit you as much as they can.

An important example is politics. Political causes want every spare minute and dollar. They want to choose your friends, words and thoughts. If given power, they seize the resources of state and nation for their purposes. Then they take those purposes further. One cannot simply give any political movement what it wants. That way lies ruin and madness.

Yes, that means your cause, too.

In the comments, I brought up a pattern I knew Zvi was worried about, which is the way Effective Altruism as a memeplex seems to want to encourage people to unhealthily "give their all." 

I noted: the problem is, it's true that the world is on fire and needs all the help it can get. And, there is a way to engage with that, where you accept that truth into yourself, and also track your various other goals and needs and values. Aim to be a coherent person who enthusiastically helps where you can, but, also genuinely pursuing other interests so you don't throw your mind away, and have maintain slack even if something feels like "an emergency" and just genuinely hold onto the other things 

I think this is what much EA leadership explicitly believe. And, I think it's reasonable and basically correct.

Nonetheless, there are problems:

Getting to the point where you're on board with Point A [i.e. sane, healthy integration of your goals, including "the world is on fire"], getting to the health version of Point A often requires going through some awkward and unhealthy stages where you haven't fully integrated everything. Which may mean you are believing some false things and perhaps doing harm to yourself.

Even if you read a series of lengthy posts before taking any actions, even if the Giving What We Can Pledge began with "we really think you should read some detailed blogposts about the psychology of this before you commit" (this may be a good idea), reading the blogposts wouldn't actually be enough to really understand everything.

So, people who are still in the process of grappling with everything end up on EA forum and EA Facebook and EA Tumblr saying things like "if you live off more than $20k a year that's basically murder". (And also, you have people on Dank EA Memes saying all of this ironically except maybe not except maybe it's fine who knows?)

And stopping all this from happening would be pretty time consuming.

2) The world is in fact on fire, and people disagree on what the priorities should be on what are acceptable things to do in order for that to be less the case. And while the Official Party Line is something like Point A, there's still a fair number of prominent people hanging around who do earnestly lean towards "it's okay to make costs hidden, it's okay to not be as dedicated to truth as Zvi or Ben Hoffman or Sarah Constantin would like, because it is Worth It."

And, if EA is growing, then you should expect at even given point, most of the people around are people who are in the awkward transition phase (or, not even realizing they should be going through an awkward transition phase), so they are most of the people you hear. 

And that means they're a pretty dominant force in the resulting ecosystem, even if the leadership was 100% perfectly nuanced.

This sort of concern is part of why I think LessWrong should specifically not try to be evangelical about rationality. This is a community with nuanced group epistemics, and flooding it with people who don't get that would ruin it. I think EA sort of needs more intrinsically to be at least somewhat evangelical, but, they should heavily prioritize the fidelity of the message and not trying to grow faster than they can stably do.

IABIED doesn't intrinsically need to be evangelical – it's hypothetically enough for the right people to read the book and be persuaded to take the arguments seriously. But, people, including/especially politicians, have trouble believing things that society thinks is crazy. 

So, it's the sort of memeplex that does actively want to be evangelical enough to get to a point where it's a common, reasonable sounding position that "If any company or group, anywhere on the planet, builds an artificial superintelligence using anything remotely like current techniques, based on anything remotely like the present understanding of AI, then everyone, everywhere on Earth, will die."

"Privilege" and "Intellectual Freedom."

Two more examples:

Privilege

In the early 00s, people would talk about "privilege", noting that society is set up in ways that happen to benefit some people more than others. Some classes of people are benefited systematically, or harmed systematically. 

It's useful to notice a) individual people/situations where someone is being harmed in a way that you have a hard time noticing or empathizing with, and b) note that there is second order effect of "society constantly denying that your problems are real problems" that sucks in a particular way that's even less obvious if you haven't experienced it.

That's the motte of privilege. The bailey, which at least a significant chunk of people seem to explicitly, reflectively endorse (let alone implicitly, accidentally), is "and the privileged people should feel guilty and take a bunch of costly actions and maybe get fired if they say the wrong thing."

Intellectual Freedom

Sort on the flipside: as social justice gained a lot of power in universities, a number of conservative-leaning intellectuals became frustrated and argued a bunch for free speech, that you should be able to talk about racial differences in IQ or sex differences in typical aptitudes. 

This is partly because we just want to object level model the world and understand how biology and sociology etc work, and partly because once you make one topic taboo. But also, intellectual taboos are sort of contagious – it becomes harder to talk about IQ at all, if you risk saying the wrong specific-taboo word incidentally. And there is general scope-creep of what is taboo'd.

That's the motte. The bailey, which at least a significant chunk of people seem to explicitly, reflectively endorse is, "and, black people / women should be second class citizens in some ways." 

There isn't an easy answer

I've watched the liberal and conservative memeplexes evolve over the past two decades. I have been pretty disappointed in people I would have hoped were being principled, who seemed to turn out to mostly just want to change the rules to benefit them and exploit the rules once society turned in their favor.

I think privilege is a pretty real, useful concept, in my personal relationships and in my community organizing. I think it's an important, true fact that trying to make some intellectual ideas taboo has bad consequences.

But it seems pretty difficult, at humanity's current collective wisdom level, to take those concepts at face value without them frequently spinning into broader political agendas. And, whether you think any of the agendas or good or bad, the main point here is that it's a fact-of-the-matter that they are not currently existing only in isolation.

What do we do about that? Ugh, I don't know. It's particularly hard on the broader society level where everything is very porous and it's nigh impossible and so many people's epistemics suck.

On LessWrong, we can at least say "Look, you need to have internalized the Political Prerequisites sequences. You are expected to make attempts to decouple, and to track Policy Debates Should Not Appear One-Sided, and Avoid Unnecessarily Political Examples, and such." (And, try to hold each other accountable when we're fucking it up).

But this doesn't quite leave us with a "what necessarily to do, though, when the topic is necessarily politically charged, with consequences outside of LessWrong discussion fora?"

Okay, back to "The Problem" as I conceive it

From my perspective, here's what things look like:

This all does not justify any given desperate action. But it does mean I am most excited about plans that route through "actually try to move the Overton Window in a very substantial way", if those plans are actively good.

Within my worldview, an important aspect of the "Change Overton Window" plan, is that humanity will need to do some pretty nuanced things. It's not enough to get humanity to do one discrete action that you burn through the epistemic commons to achieve. We need an epistemic-clarity-win that's stable at the the level of a few dozen world/company leaders.

I think MIRI folk agree roughly with the above. I am less sure whether the Future of Life Institute people agree with that last bit. Regardless, the standard here is very challenging to meet and even well meaning people won't be meeting it on a bad day. And there's all kinds of other people who are not trying to meet that standard at all who will take your ideas and run with them.

I don't have a specific solution in mind, but I think:

I think what I want is mostly "people pushing for agreement on simple statements, acknowledge this as a thing to watch out for, and put at least some effort into pumping against it." And, people who agree with simple statements but are wary of them spiraling politically into something bad you disagree with, cut the first group a bit of slack about it. (but, like, not infinite slack)

  1. ^

    When Eliezer, Nate, or MIRI et al are at their best, I think they avoid actually doing this (largely by pretty overtly stating the bailey as also important to them), but, I don't know that they're always at their best, and whether they are or not, seems reasonable from the outside to be worried about it.

  2. ^

    I have a post brewing that argues this point in more detail.

  3. ^

    If you think I'm explicitly wrong about this, btw, I'm interested in hearing more about the nuts and bolts of this disagreement. Seems like one of the most important disagreements.

  4. ^

    There's an annoying problem where even writing out my concerns sort of reifies "there is a tribal conflict going on", instead of trying to be-the-change-I-wanna-see of not looking at it through the lens of tribal conflict. But, the tribal conflict seems real and not gonna go away and we gotta figure out a better equilibrium and I don't know how to do it without talking about it explicitly.

    In addition to dicey move of "bring up political examples", I also need to psychologize a bit about people on multiple sides of multiple issues. Which is all the more so because there's not really a single person I'm psychologizing.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Motte-and-Bailey AI Safety Political Discourse Hedge Drift LessWrong
相关文章