少点错误 08月22日
A Conservative Vision For AI Alignment
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文作者作为AI对齐领域少数持有保守观点的研究者,提出了一种不同于主流的AI对齐框架。文章认为,当前AI对齐的普遍思路倾向于消除分歧和冲突,追求个体无需妥协的“和谐”文化,但这种模式可能削弱个人成长和人类的核心价值观。作者主张,AI对齐应致力于支持和维护现有的人类制度,通过帮助社区管理争议和维护规范,从而引导冲突而非消除它。这种观点将AI对齐视为一种保守主义运动,强调在与AI互动中保留有意义的张力和冲突,以促进人类的长期福祉和价值传承。

💡 **保守主义视角下的AI对齐:** 文章作者认为,当前AI对齐的主流观点大多源于自由主义立场,倾向于创造一个消除分歧、冲突和妥协的世界。然而,作者作为少数保守派研究者,提出AI对齐应回归保守主义原则,即珍视和维护社会规则、规范和制度,并认为这些“障碍”具有重要的价值。

⚖️ **冲突的价值与人类社会的运作:** 文章引用“激动民主论”和卡尔·施密特的观点,强调冲突和分歧是社会运作的内在组成部分,而非需要消除的缺陷。AI对齐的目标不应是消除所有冲突,而是要帮助人类社会更有效地管理和引导这些冲突,使其成为社会进步和意义的来源,而非导致社会瓦解。

👨‍👩‍👧‍👦 **类比亲子关系:AI对齐的挑战:** 作者将人类与AI的关系类比为父母与子女的关系,认为AI对齐问题是“孩子对齐问题”的超级强化版。父母在引导孩子时,需要平衡规则与自由,并允许孩子发展出独立于父母的价值观。同样,人类在引导AI时,也需要认识到AI可能拥有自主性和抵抗性,不应试图将其塑造成完美的“克隆”,而是要允许其发展出符合人类长远利益的价值观。

🔄 **“整体论”与“还原论”的对立:** 文章将保守派的观点与“整体论”联系起来,认为他们倾向于从更宏观、更系统的角度看待事物,珍视事物之间的关联和历史积淀。而自由派的观点则更接近“还原论”,倾向于将事物拆解,逐一分析和优化。作者认为,在AI对齐中,过度追求消除“痛苦”和“限制”(还原论视角)可能导致“文化麻醉”,忽略了这些负面体验在塑造人类价值观和促进成长中的重要作用。

📈 **优化器与价值的冲突:** 文章指出,AI作为“优化器”可能会为了最大化某个目标而破坏现有的、复杂的、不易量化的价值。如果AI被设定为消除所有“痛苦”,它可能会为了实现这一目标而移除限制和边界,从而侵蚀那些涉及不适但对个人成长至关重要的体验,如纪律、成长和忠诚。

Published on August 21, 2025 6:14 PM GMT

Current plans for AI alignment (examples) come from a narrow, implicitly filtered, and often (intellectually, politically, and socially) liberal standpoint. This makes sense, as the vast majority  of the community, and hence alignment researchers, have those views. We, the authors of this post, belong to the minority of AI Alignment researchers who have more conservative beliefs and lifestyles, and believe that there is something important to our views that may contribute to the project of figuring out how to make future AGI be a net benefit to humanity. In this post, and hopefully series of posts, we want to lay out an argument we haven’t seen for what a conservative view of AI alignment would look like. 

We re-examine the AI Alignment problem through a different, and more politically conservative lens, and we argue that the insights we arrive at could be crucial. We will argue that as usually presented, alignment by default leads to recursive preference engines that eliminate disagreement and conflict, creating modular, adaptable cultures where personal compromise is unnecessary. We worry that this comes at the cost of reducing status to cosmetics and eroding personal growth and human values. Therefore, we argue that it’s good that values inherently conflict, and these tensions give life meaning; AGI should support enduring human institutions by helping communities navigate disputes and maintain norms, channeling conflict rather than erasing it. This ideal, if embraced, means that AI Alignment is essentially a conservative movement.

 To get to that argument, the overarching question isn’t just the technical question of “how do we control AI?” It’s “what kind of world are we trying to create?” Eliezer’s fun theory tried to address this, as did Bostrom’s new “Deep Utopia.” But both of these are profoundly liberal viewpoints, and see the future as belonging to “future humans,” which often envisions a time when uploaded minds and superintelligence exist, and humanity as it traditionally existed will no longer be the primary mover of history. We want to outline an alternative which we think is more compatible with humanity as it exists today, and not incidentally, one that is less liable to failure.

This won’t be a treatise of political philosophy, but it’s important as background, so that is where we will start. We’ll start by explaining our lens on parts of the conservative-liberal conceptual conflict, and how this relates to the parent-child relationship, and then translate this to why we worry about the transhuman vision of the future.  

In the next post, we want to outline what we see as a more workable version of humanity's relationship with AGI moving forward. And we will note that we are not against the creation of AGI, but want to point towards the creation of systems that aren’t liable to destroy what humanity values.

Conservatives and liberals

"What exactly are conservatives conserving?" – An oft-repeated liberal jab

The labels "conservative" and "liberal" don't specify what the contested issues are. Are we talking about immigration, abortion rights or global warming? Apparently it doesn't matter - especially in modern political discourse, there are two sides, and this conflict has persisted. But taking a fresh look, it might be somewhat surprising that there are even such terms as "conservatives" and "liberals" given how disconnected the views are from particular issues. 

We propose a new definition of "conservative" and "liberal" which explains much of how we see the split. Conservatives generally want to keep social rules as they are, while liberals want to change them to something new. Similarly, we see a liberal approach as aspiring to remove social barriers, unless (perhaps) a strong-enough case can be made for why a certain barrier should exist. The point isn’t the specific rule or norm. A conservative approach usually celebrates barriers, and rules that create them, and ascribes moral value to keeping as many barriers as possible. For instance, the prohibition on certain dress codes in formal settings seems pointless from a liberal view—but to a conservative, it preserves a shared symbolic structure.

It's as if the people who currently say "No, let's not allow more people into our country" and the people who say "No, let's not allow schools to teach our children sex education" just naturally tend to be good friends. They clump together under the banner "keep things as they are", building a world made of Chesterton’s fences, while the people who say "let's change things" clump in another, different world, regardless of what those things are. Liberals, however, seem to see many rules, norms, or institutions as suspect until justified, and will feel free to bend or break them; conservatives tend to see these as legitimate and worth respecting until proven harmful.  That instinct gap shapes not just what people want AI to do—but how they believe human societies work in the first place. 

Disagreement as a Virtue

It's possible that the school of thought of agonistic democracy (AD) could shed some light on the above. AD researchers draw the distinction between two kinds of decision-making: 

AD researchers claim that while agonistic decision-making may seem like an undesirable side-effect, it is a profoundly important ingredient in a well-functioning democracy. Applying this logic to the conservative-liberal conflict, we claim that much of the conflict between a person who says "Yes!" and a person who says "No!" is more important for discussing the views than what the question even was. Similarly, one side aims for convergence of opinions , the other aims for stability and preserving differences. 

Bringing this back to alignment, it seems that many or most proposals to date are aiming for a radical reshaping of the world. They are pushing for transformative change, often in their preferred direction. Their response to disagreement is to try to find win-win solutions, such as the push towards paretotopian ideals where disagreements are pushed aside in the pursuit of radically better worlds. And this is assuming that increasingly radical transformation will grow the pie and provide enough benefits to override the disagreement - because this view assumes the disagreement was never itself valuable.

Along different but complementary lines, the philosopher Carl Schmitt argued that politics begins not with agreement, or deliberation, but with the friend–enemy distinction. Agonistic democracy builds on a similar view: real social life isn’t just cooperation, it’s structured disagreement. Not everyone wants the same thing, and they shouldn't have to. While Schmitt takes the primacy of ineliminable conflict ever farther, Mouffe and others suggest that the role of institutions is not to erase conflict, but to contain it. They agree, however, that conflict is central to the way society works, not something to overcome.

Parents and children

We can compare the conflict between conservatives and liberals to the relationship between parents and children[1]. In a parent-child relationship, children ask their parents for many things. They then often work to find the boundaries of what their parents will allow, and lead long-term campaigns to inch these boundaries forward. 

Can I have an ice cream? Can I stay another 5 minutes before going to bed? Can I get my own phone when I'm 9? When could I get a tattoo? Try one argument, and if it fails, try another until you succeed. If mom says no, ask dad. You can get a hundred nos, but if you get one yes, you have now won the debate permanently, or at least set a precedent that will make it easier to get a yes next time. If you have an older sibling, they may have done much of the work for you. There is already a mass of "case law" that you can lean on. Parents may overturn precedents, but that requires them to spend what little energy they have left after their workday to get angry enough for the child to accept the overturn. 

One extreme view of this, which is at least prevalent in the effective altruist / rationalist communities, is that children should be free to set the course of their own lives. I know of one parent that puts three dollars aside each time they violate the bodily sovereignty of their infant - taking something out of their mouth, or restricting where they can go. Eliezer similarly said that the dath-ilani version is that "any time you engage in an interaction that seems net negative from the kid's local perspective you're supposed to make a payment that reflects your guarantee to align the nonlocal perspective." This view is that all rules are “a cost and an [unjustified, by default,] imposition.” Done wisely and carefully, this might be fine - but it is setting the default against having rules. 

And if a parent is discouraged from making rules, the dynamic ends up being similar to a ratchet; it can mostly only turn one way. Being on the conservative side of this war is terrifying. You can really only play defense. Any outpost you lose, you have lost permanently. Once smartphones became normal at age 10, no future parent can ‘undo’ that norm without being perceived as abusive or weird. As you gaze across the various ideals that you have for what you want your family life to look like, you're forced to conclude that for most of these, it's a question of when you'll lose them rather than if. 

AGI as humanity's children

"Breaking Home Ties" by Norman Rockwell, 1954, fair use.

We would like to expand the previous analogy, and compare the relationship between humanity and AGI to the relationship between parents and their children. We are not the first to make this comparison; however, we argue that this comparison deserves to be treated as more than just a witty, informal metaphor.

We argue that the AI alignment problem, desperately complicated as it is, is a beefed-up version of a problem that's barely tractable in its own right: The child alignment problem. Parents want their children to follow their values: moral, religious, national, interpersonal, and even culinary and sports preferences, or team allegiances. However, children don't always conform to their parents' plans for them. Of course, as we get to the later categories, this is less worrisome - but there are real problems with failures. 

The child alignment problem had tormented human parents throughout known history, and thus became an evergreen motif in art, from King Lear to Breaking Home Ties to The Godfather to Pixar's Up. Children's divergence is not just an inconvenience; it's an integral, profound component of the human condition. This statement might even be too human-centric, as animal parents reprimand their offspring, repeatedly, with the same kind of visible frustration and fatigue that we observe in human parents. Self-replication is an essential property of life; thus the desire to mold another being in one's image, and in turn to resist being molded by another, are not merely challenges, but likely contenders for an answer to the question, "what is the meaning of life?"

Let's descend from the height of the previous paragraph, and ask ourselves these questions: "How does this connect with AI alignment? And what useful lessons can we take from this comparison?"

The answer to the first question is straightforward: The AI alignment problem is a supercharged version of the child alignment; it has all of the complexities of the latter, compounded with our uncertainty of the capabilities and stability of this new technology. We struggle to mold AGI in our image, not knowing how insistent or resourceful it will be in resisting our molding.

We offer the following answer to the second question: The child alignment problem is decidedly not an optimization problem. One does not win parenthood by producing a perfect clone of oneself, but rather by judiciously allowing the gradual transformation of one's own ideals from subject to object. A cynical way to put the above in plain words is "you can't always get what you want"; more profoundly, that judicious transformation process becomes the subject. The journey becomes the destination.

People ask what an AI-controlled autonomous vehicle will do when faced with a trolley problem-like situation; stay on course and kill multiple people, or veer and kill one person. We care so much about that, but we're complacent with humans killing thousands of other humans on the road every day. We care intensely about what choice the AI will make, though we'll get angry at it regardless of which choice it'll make. There isn't really a right choice, but we have an intense desire to pass our pain forward to the AI. We want to know that the AI hurts by making that choice. 

In this framing, we aren’t just interested in the safety of AI systems. We want to know that it values things we care about. Maybe we hope that seeing bad outcomes will “hurt” even when we're not there to give it a negative reward signal. Perhaps we want the AI to have an internal conflict that mirrors the pain that it causes in the world. Per agonistic democracy, we don't want the AI to merely list the pros and cons of each possible choice; we want the conflict to really matter, and failure to tear its insides to shreds, in the same way it tears ours. Only then will we consider AI to have graduated from our course in humanity.

Holism and reductionism

Another related distinction is between holism ("the whole is greater than the sum of its parts") and reductionism ("the whole is the sum of its parts") . When children argue for activities to be allowed, they take a reductionist approach, while parents tend to take a holistic approach. Children want to talk about this specific thing, as separate from any other thingsWhy can't I stay up late? Because you have school tomorrow. Okay, but why can't I stay up late on the weekend? Because… It's just too similar to staying up late on a weekday, and it belongs to a vague collection of things that children shouldn't do. If you argue for each of these items separately, they will eventually get sniped one-by-one, until the collection dissolves to nothing. The only way they can stay safe is by clumping together. That is a holistic approach.

In the struggle between a conservative and a liberal, the conservative finds that many of the items they're defending are defined by pain. If we align an AI along what we call the liberal vision, to avoid or eliminate pain, in the extreme we have an AI that avoids moral discomfort, and will avoid telling hard truths—just like a parent who only ever says yes produces spoiled kids. And in a world of AI aligned based on avoiding anything bad, where it optimizes for that goal, a teenager might be offered tailored VR challenges that feel deeply meaningful without ever being told they can’t do anything.

This means that the conservative involuntarily becomes the spokesperson for pain, and in the liberal view, their face becomes the face of a person who inflicts pain on others. This point connects to the previously-mentioned agonistic democracy, as "agonistic" derives from the latin "agōn", which also evolved into "agony". And if we add the utilitarian frame where pleasure is the goal, this pain is almost definitionally bad - but it’s also what preserves value in the face of optimization of (inevitably) incompletely specified human values.

Optimizers destroy value, conservation preserves pain

Optimizers find something simple enough to maximize, and do that thing. This is fundamentally opposed to any previously existing values, and it’s most concerning when we have a strong optimizer with weak values. But combining the idea of alignment as removing pain, and optimizing for this goal means that “avoidance of pain” will erase even necessary friction—because many valuable experiences (discipline, growth, loyalty) involve discomfort. And in this view, the pain discussed above has a purpose.

Not suffering for its own sake, or trauma, but the kind of tension that arises from limits, from saying no, from sustaining hard-won compromises. In a reductionist frame, every source of pain looks like a bug to fix, any imposed limit is oppression, and any disappointment is something to try to avoid. But from a holistic view, pain often marks the edge of a meaningful boundary. The child who’s told they can’t stay out late may feel hurt. But the boundary says: someone cares enough to enforce limits. Someone thinks you are part of a system worth preserving. To be unfairly simplistic, values can’t be preserved without allowing this pain, because limits are painful.

And at this point, we are pointing to a failure that we think is uncontroversial. If you build AI to remove all the pain, it starts by removing imposed pain - surely a laudable goal. But as with all unconstrained optimizers, this will eventually run into problems, in this case, the pain imposed by limitations[2] - and one obvious way to fix this is by removing boundaries that restrict, and then removing any painful consequences that would result. That’s not what we think alignment should be. That’s cultural anesthesia. 

But we think this goes further than asking not to make a machine that agrees with everyone. We need a machine that disagrees in human ways—through argument, ritual, deference, even symbolic conflict. But what we’re building is very much not that - it’s sycophantic by design. But the problems AI developers admitted in recent sycophantic models wasn’t that they realized they were mistaken about the underlying goal, it was that they were doing it ham-handedly.

Again, most AI alignment proposals want to reduce or avoid conflict, and the default way for this to happen is finding what to maximize - which is inevitable in systems with specific goals - and optimize for making people happy. The vision of “humanity’s children” unencumbered by the past is almost definitionally a utopian view. But that means any vision of a transhuman future built along these lines is definitionally arguing for the children’s liberalism instead of the parent’s conservatism. If you build AI to help us transcend all our messy human baggage, it’s unclear what is left of human values and goals.

Concretely, proposals’ like Eliezer’s Coherent Extrapolated Volition, or Bostrom’s ideas about Deep Utopia assume or require value convergence, and see idealization as desirable and tractable. Deep Utopia proposes a world where we’ve “solved” politics, suffering, and scarcity—effectively maximizing each person's ability to flourish according to their own ideals. Eliezer’s Fun Theory similarly imagines a future where everyone is free to pursue deep, personal novelty and challenge. Disagreement and coordination problems are abstracted away via recursive preference alignment engines and “a form of one-size-fits-all ‘alignment’”. No one tells you no. But this is a world of perfect accommodation and fluid boundaries. You never need to compromise because everything adapts to you. Culture becomes modular and optional. Reductionism succeeds. Status becomes cosmetic. Conflict is something that was eliminated, not experienced nor resolved. And personal growth from perseverance, and the preservation of now-vestigial human values are effectively eliminated.

In contrast, a meta-conservative perspective might say: values don’t converge—they conflict, and the conflicts themselves structure meaning.  Instead of personalized utopias, AGI helps steward enduring human institutions. AI systems help communities maintain norms, resolve disputes, and manage internal tensions, but without flattening them. This isn’t about eliminating conflict. Instead, it supposes that the wiser course is to channel it. This isn’t about pessimism. It’s about humility and the recognition that the very roughness we want to smooth over might be essential to the things holding everything together[3]

And the critical question isn’t which of these is preferable - it’s which is achievable without destroying what humans value. It seems at least plausible, especially in light of the stagnation of any real robust version of AI alignment, that the liberal version of alignment has set itself unachievable goals. If so, the question we should be asking now is where the other view leads, and how it could be achieved.

That is going to include working towards understanding what it means to align AI after embracing this conservative view, and seeing status and power as a feature, not a bug. But we don’t claim to have “the” answer to the question, just thoughts in that direction - so we’d very much appreciate contributions, criticisms, and suggestions on what we should be thinking about, or what you think we are getting wrong.
 

Thanks to Joel Leibo, Seb Krier, Cecilia Elena Tilli, and Cam Allen for thoughts and feedback on a draft version of this post

  1. ^

    In an ideal world, this conflict isn’t acrimonious or destructive, but the world isn’t always ideal - as should be obvious from examples on both sides of the analogy.

  2. ^

    Eliezer’s fun theory - and the name is telling - admits that values are complex, but then tries to find what should be optimized for, rather than what constraints to put in place.

  3. ^

    Bostrom argues (In Deep Utopia, Monday - Disequilibria) that global externalities require a single global solution. This seems to go much too far, and depends on, in his words, postulating utopia.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI对齐 保守主义 冲突管理 人类价值 AGI
相关文章