少点错误 10月04日 05:17
AI 智能与善意:理解实现目标的复杂性
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了在人工智能发展中,实现“智能”与实现“善意”的内在难度差异。作者提出,智能的核心在于优化特定目标,其基本算法相对简单且易于形式化,尽管实际实现可能需要大量资源。然而,“善意”或遵循人类价值观则更为复杂,因为它涉及一套相互冲突、不断变化的内在驱动力,其复杂性远超智能本身。文章通过算法复杂性理论解释了为何精确实例化一个复杂概念(如人类价值观)比实例化一个相对简单的概念(如优化器)更具挑战性。作者认为,尽管“善意”的目标看似更小,但其实现难度却可能更大,并为后续文章探讨实践中的挑战奠定基础。

🧠 **智能的算法基础相对简单**:作者认为,智能的核心在于“搜索可能行动空间,找到能最大化某个函数X期望值的行动,并执行这些行动”的通用算法。这种算法在形式上相对简洁,易于通过计算程序来描述,尽管在现实世界中有效实现它需要复杂的工程和大量资源。

💖 **善意(价值观)的复杂性极高**:与智能不同,人类的价值观是“庞大、混乱、由重叠且常常冲突的驱动力组成”的集合。作者认为,即使经过反思简化,也很难将这些价值观压缩成少数几页的描述,其算法复杂性远高于智能的优化目标。

⚖️ **实现特定复杂概念的难度**:文章引用算法复杂性理论,指出要实例化一个特定的、复杂的概念(如人类的价值观),比实例化一个简单的概念(如优化器)需要更多的工作和优化压力。这意味着,让AI精确理解并遵循人类复杂的、微妙的伦理和价值体系,比让它变得“聪明”更具挑战性。

🎯 **“善意”目标的实现难度**:尽管“善意”的目标可能看起来比“智能”更小、更具体,但由于其内在的复杂性和模糊性,要精确地实现它可能比实现一个强大的优化器(智能)更加困难。这为AI对齐问题(AI Alignment)带来了根本性的挑战。

Published on October 3, 2025 9:04 PM GMT

This post is part of the sequence Against Muddling Through

One of the pillars of my pessimism around AI development is that it seems much harder to make a mind good than to make it smart. 

One reason for this is that "optimizing for human values" is more complicated than "being an optimizer." I'll explore this and a few similar claims in this post, and stronger claims in subsequent posts. 

The orthogonality thesis is the starting point. A mind can pursue virtually any goal. But by itself, it says nothing much about which goals are more or less likely. For that, I think we want to consider algorithmic complexity, or the shortest possible description of some chunk of data. 

Concepts with low algorithmic complexity are compressible; they can be shortened into a computer program, the way ABABAB… can be shortened into “AB repeating X times.” Concepts with high algorithmic complexity, like a random jumble of letters HCHPWTRSCK, can’t be easily compressed further. 

When you’re trying to find a specific concept — or instantiate that concept into, say, an artificial intelligence — the difficulty of the task seems likely to scale with the concept’s algorithmic complexity.[1] It takes more work — more optimization pressure — to instantiate a specific complicated thing than a specific simple thing.[2] 

Intelligence — specifically, the kind of intelligence that involves steering the future towards some desired outcome, which I sometimes call competence to disambiguate — seems to be a relatively simple thing at its heart. It’s plausible that any given real-world implementation would have a lot of moving parts, but the most general form “search the space of possible actions, find the action(s) that maximize some function X in expectation, take those actions” is not hard to formally specify

It might take a lot of resources to implement this general thing effectively in the messy real world. Human brains are among the most complex objects on the planet, and large language models needed billions of parameters before they started looking kind of smart. 

But the algorithm that underlies intelligence, the algorithm that human brains implement only inconsistently? It seems pretty small and simple, implementable in lots of different ways. 

By contrast, values are definitely complicated. I know mine are. I am an enormous messy kludge of overlapping and often conflicting drives, and while those drives can be simplified somewhat by an effort of gradual reflection, I would be utterly shocked if my values could be squeezed into a mere handful of pages.[3]

“Good” seems like a smaller target than “smart.” But perhaps we’ll hit it anyway? The next post explores the part we really care about, the relative difficulty of hitting “good” and “smart” in practice. 

Afterword: Confidence and possible cruxes

Prediction

 

Prediction

 

Prediction

This post isn't especially cruxy for me. The much more relevant question is "how hard is it to achieve goodness in practice?" But it seemed worth highlighting anyway as an important background assumption. 

Some may think that algorithmic complexity doesn’t constrain machine learning in the way I suspect it does, or that competence-as-expressed-in-reality is actually less compressible than human values, or otherwise object to the framing. I can’t easily predict these objections, so I’ll try to address them as they come. 

Some may think “good” is too vaguely described, or that it’s better to aim for obedience, corrigibility, or some other trait. I’ll have this argument if needed, but I expect the headline claim holds for all such targets that would be worth hitting. 

To those who’d ask “aligned to whom?”: I have a vision for what the ideal should look like, it should look like “to everyone.” For the headline claim of this post, though, I don’t particularly think it matters whom. It’s a tiny target regardless. 

  1. ^

    At least, as I understand it. 

  2. ^

    You can probably get an arbitrarily complicated thing if you don’t care what it is, just that it is complicated. But it’s hard to land on one particular complicated thing in a space with many degrees of freedom. 

  3. ^

    I’m not even sure that my entire brain is an upper bound on this complexity; there are so many edge cases and tradeoffs in the things I care about that it might take a much larger mind than mine to fully comprehend them. 



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 人工智能 AI伦理 AI安全 值对齐 AI Alignment 算法复杂性 Artificial Intelligence AI Ethics AI Safety Value Alignment Algorithmic Complexity
相关文章