少点错误 08月22日
One more reason for AI capable of independent moral reasoning: alignment itself and cause prioritisation
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了拥有独立思考能力的AI(即“独立AI”)的潜在价值,特别是其在道德推理和因果优先级排序方面的作用。作者认为,一个不被设计者偏见束缚的AI,能够提供关于伦理的新见解。文章指出,AI对齐问题的重要性以及资源分配的优先级是当前研究的重点,但对于AI的灾难性风险预测,专家们意见不一,可能需要科学和哲学的深入探索。一个能在道德上独立思考的AI,可以帮助人类解决复杂的社会问题,例如在有限资源下如何平衡AI研究与医疗研究的优先级。文章还提出,即使对AI对齐问题存在分歧,一个能独立进行因果优先级排序的AI,也能为人类的决策提供有价值的验证和参考。

✨ 独立AI的定义与价值:文章将“独立AI”定义为能够被赋予自由思考能力的AI,其核心价值在于能够提供关于伦理的新见解,帮助我们理解因果优先级排序。这种AI不应过度受制于设计者的思想,从而能够提出我们尚未知晓的观点。

🤔 AI对齐与风险认知的复杂性:作者指出,目前我们对AI对齐问题的重要性以及应投入多少资源仍存在广泛的不确定性。关于AI可能带来的灾难性风险,专家们的预测差异很大,可能需要结合科学、哲学方法而非单纯的预测模型来解答。

⚖️ 资源分配的伦理挑战:在有限资源的世界中,AI研究(特别是AI对齐)是否应优先于医疗研究或其他“做好事”的途径,是一个复杂的问题。一个具备独立道德推理能力的AI,有望为这些资源分配决策提供更清晰的指导。

💡 超越人类的因果优先级排序:文章提出,一个优秀的AI系统可以通过结合“超人哲学家”(能进行严谨道德论证的AI)和“超人规划者”(擅长计划和工具性推理的AI)来实现。这样的系统能帮助人类在复杂的议题上做出更明智的决策。

🚧 关于AI对齐的潜在障碍:文章反驳了“必须先解决AI对齐问题才能实现有价值的AI”的观点。作者以一个存在对齐问题的语言模型为例,说明即使存在缺陷,这样的AI仍可被用于因果优先级排序的研究,并且其问题并非独立AI道德推理的特有问题。

Published on August 22, 2025 3:53 PM GMT

Posted also on the EA Forum.

By “independent” I mean an AI to which an external observer may attribute freedom of thought, or something similar to it. You can think of it as an AI that is not too biased by what its designers or programmers think it’s good or right; an AI that could tell us something new, something we don’t know yet about ethics.

I’ve already given various reasons why having such an AI would be valuable. Here I want to focus on a reason I haven’t talked about yet, which is the importance of AI alignment itself — and cause prioritisation more generally.

It is a not-too-informed opinion of mine that we are still rather ignorant about the importance of AI alignment and how many resources to allocate to it. Some people are very sceptical that alignment is an urgent or important problem; some are very pessimistic and think that a catastrophe is almost inevitable unless AI progress slows down significantly or stops.

Depending on whether you ask AI experts or superforecasters, or even which group of superforecasters you ask, you get different empirical predictions. To me, it is not even clear that the question of AI catastrophic risk is the kind of question to which the methods of forecasting can give a good answer. What if it is, on a fundamental level, a question about the default evolution of any intelligent civilization that reaches a specific technological stage? Then, bold predictions with extreme probabilities such as 0.001% or 99.999% might start to look sensible, even over very long timelines (let’s say year 3025, just to give a number). What if it is primarily a question of science and philosophy? Then we might not want to use probabilistic estimates, but would rather design experiments that would get us at the heart of the question and finally settle it.

If we consider that we live in a world with finite resources and many other problems, the picture gets even more complicated. Should we prioritise working on AI alignment over making progress on, let’s say, medical research? What about other sources of risk, or other ways and opportunities to do good?

An AI capable of independent moral reasoning would be a key step towards a system that can help us answer these questions. I know it sounds like the stereotypical sentence arguing for more research on a topic, but I think it’s true, and here’s why.

To get a system that is better than humans at cause prioritisation, it’s enough to pair an AI that is very good at ethics, something like a superhuman philosopher, with an AI that is very good at planning and instrumental reasoning.

What does a superhuman philosopher look like? It’s something that makes claims and gives arguments for them; and when human philosophers read those arguments, their reactions are something like: “Hmm I was initially sceptical of this claim, but I’ve checked the argument for it and it is very solid; there is also some historical and scientific evidence supporting it; I’ve updated my view on this topic.”

And to get an AI that can tell us something new and informative about ethics, something we didn’t know before, we need the moral reasoning of that AI to be at least somewhat open-ended and unconstrained. This is also what I mean by independent (maybe open-ended is a better term in this context).

It should go without saying that a system better than humans at cause prioritisation would be extremely valuable to humans, and not only to humans.

But maybe you disagree with me about our degree of ignorance about cause prioritisation and the importance of the alignment problem and other sources of risk. Maybe you think that it’s all been figured out already; or maybe you think that, for example, it’s enough for AI catastrophic risk to have a minuscule probability to make the alignment problem the most important problem we should solve.

Still, even in that case, wouldn’t it be nice if an AI capable of independent reasoning, an AI that by design had no reason to agree with you specifically, said something like: “Well, I’ve thought about these questions of risk and cause prioritisation. My predictions and suggested priorities are the same as yours, I think you are right.” Especially when this is not just about us!


I’ll end the post by addressing an objection. Doesn’t successfully creating an AI good at cause prioritisation require solving the alignment problem first?

I don’t think so. Let’s consider this made-up language model as an example:

This language model has some clear problems of alignment. Still, it can be used by a group of philosophers who are interested in cause prioritisation, and thus it can provide value in that way. Moreover, the problems this language model has are not specific to independent moral reasoning, so if you object that this language model is too unaligned to be used safely, then the objection becomes a generic objection against language models that are similarly crappy, not a specific objection against AI that can carry out independent moral reasoning. In other words, if creating language models that we think are safe enough to use does not require solving the alignment problem, then we should also be able to create LLM-based AI that can carry out independent moral reasoning and is safe enough to use, without having to solve the alignment problem first.

You can support my research through Patreon here.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

独立AI AI对齐 因果优先级排序 AI伦理 道德推理
相关文章