少点错误 10月03日
AI对齐:理解人类价值的多样性与挑战
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了人工智能(AI)对齐的核心议题,即如何将AI的价值观与人类价值观相协调。文章指出,人类价值观因文化、地域和个体差异而呈现出巨大多样性,这种多样性是实现AI对齐的一大挑战。研究表明,AI模型(如LLMs)倾向于学习训练数据中存在的“WEIRD”(西方、教育、工业化、富裕、民主)偏见,导致其与非WEIRD文化背景下的人类价值观产生距离。因此,AI对齐不仅是技术问题,更是一个深刻的社会、文化和道德议题,需要人类首先实现内部的价值协调,才能有效引导AI的发展方向,避免潜在的歧视和失控风险。文章强调,开放的对话和倾听是迈向AI对齐的第一步。

💡 **价值观的多样性是AI对齐的根本挑战**:人类的价值观受文化、历史和生活经历影响,呈现出极大的差异性。例如,WEIRD(西方、教育、工业化、富裕、民主)文化倾向于强调个体主义和道德准则,而非WEIRD文化则可能更重视精神纯洁和集体责任。这种内在的差异性意味着,试图将AI对齐到单一的“人类价值观”是一个复杂且可能不切实际的目标,因为“人类”本身就代表着多元化的价值体系。

🤖 **AI模型易受训练数据偏见影响**:文章指出,用于训练AI模型的数据集往往带有特定文化(如WEIRD文化)的偏见。研究表明,大型语言模型(LLMs)在学习过程中会继承这些偏见,导致其价值观与非WEIRD文化背景下的人类价值观存在显著差距。例如,GPT模型与人类价值观的契合度会随着与美国文化距离的增加而降低,这意味着AI可能在无意中加剧现有的文化隔阂和不平等。

🤝 **AI对齐与人类自身对齐密不可分**:实现AI对齐的前提是人类首先能够协调自身的价值观。AI对齐的过程不仅仅是技术上的挑战,更是对人类自身道德观、文化观的一次深刻反思。文章提出,AI对齐的努力可能成为一个契机,促使人类共同探索普适性的、全球化的价值观念,但这同时也引发了关于“谁的价值观将占据主导”的权力博弈问题。

⚖️ **AI对齐涉及权力、文化和道德的复杂交织**:文章强调,AI对齐的最终实现方式将受到掌握AI开发权力的群体的影响。一个与权威主义价值观对齐的AI可能会回避挑战现有等级制度的信息,而一个与WEIRD价值观对齐的招聘AI则可能歧视非WEIRD文化背景的求职者。因此,AI对齐的议题深刻地触及了权力分配、文化认同以及道德判断等核心社会问题。

Published on October 2, 2025 4:33 PM GMT

This post was written by Sophia Lopotaru and is cross-posted from our Substack. Kindly read the description of this sequence to understand the context in which this was written.

Our lives orbit around values. We live our lives under the guidance of values, and we bond with other humans because of them. However, these are also one of the reasons why we are so different. This variance in beliefs is what might be preventing us from achieving Artificial Intelligence (AI) alignment, the process of aligning AI’s values with our own (Ji et al., 2025):

Do we need human alignment before AI alignment?

In this article, we will explore the concept of values, the importance of AI alignment, and what and whose values we should be trying to align AI with.

While the concept of ‘values’ seems quite abstract, some humans have managed to come up with a definition for it. For this article, I will use the definition of ‘values’ by the Cambridge dictionary: “the beliefs people have, especially about what is right and wrong and what is most important in life, that control their behavior".

It seems that values shape our existence. As our society is progressively involving AI not only in complex decision-making processes, but also in very intimate sectors of our lives, like the content we consume on social media platforms, we are turning AI into an indispensable tool. As a consequence, AI alignment is becoming a necessity in order to prevent rogue AI (Durrani, 2024) – AIs that cannot be controlled anymore, and operate according to goals that are in conflict with those of humans.

When considering this complex process, one must wonder: What values are we even trying to align?

While we universally consider the act of being moral as inherently good, the particular forms morality takes vary across cultures. Different cultural backgrounds influence the way and what values are passed on. For example, WEIRD (Western, Educated, Industrialised, Rich and Democratic) societies have a tendency to endorse individuality and moral code, while non-WEIRD cultures value spiritual purity and collective responsibility (Graham et al., 2016). Countless other differences can be found as the history of our cultures differs, our beliefs stem from stories told and retold and are learned throughout our lives from our parents and from our time spent in the world. There is variation not only across societies, but within societies. Consequently, the journey of trying to align AI with our values becomes the quest of humans trying to align their values with one another.

Culture influences human psychology, so which humans are we trying to align AI models to? Datasets used to train AI models introduce bias into the equation (Chapman University, n.d.). This problem becomes aggravated, as Atari et al. (2023) show that Large Language Models (LLMs) learn WEIRD behaviours from their WEIRD-biased datasets (a dataset developed in a WEIRD country contains WEIRD biases).

The researchers used the World Values Survey (WVS) (World Values Survey, n.d.), one of the most culturally diverse data sets, in order to examine where the values of LLMs lie in the broader landscape of human psychology. When comparing the AI’s responses to human input from all over the world, they confirmed that the model inherited a WEIRD behaviour. Figure 1 shows that GPT's alignment with human values decreases as cultural distance from the United States increases.

Figure 1 - Figure depicting the relationship between the cultural distance from the United States and the correlation between GPT and Humans (Atari et al.,2023)

These findings bring us back to our central idea: AI alignment cannot be separated from human alignment. The struggle for AI alignment does not only focus on the technical side, it challenges humans to change their thinking and revisit what morality means on both an individual and a global level.

Perhaps the quest for AI alignment is the means towards collectively finding human absolute values. This could lead to globalised values: either AI adopting our values or humans adopting its values. Yet, whose values would dominate? An AI aligned with authority-based values could abstain from providing information that challenges hierarchy. An AI hiring agent which is aligned with WEIRD values could be discriminatory towards candidates from non-WEIRD cultures. Even if the problem of human alignment would be resolved, the voices of those who control the development of AI will ultimately shape the degree to which AI will abide by these values.

The challenge of AI alignment is therefore inseparable from questions of power, culture, and morality. While the road to AI alignment might seem long and tedious, we can all help by taking the first step: asking questions and listening to each other's perspectives.

References

Atari, M., Xue, M. J., Park, P. S., Blasi, D. E., & Henrich, J. (2023). Which Humans? PsyArXiv. https://doi.org/10.31234/osf.io/5b26t

Chapman University. (n.d.). Bias in AI. Retrieved September 29, 2025, from https://www.chapman.edu/ai/bias-in-ai.aspx

Durrani, I. (2024). What is a Rogue AI? A Mathematical and Conceptual Framework for Understanding Autonomous Systems Gone Awry. https://doi.org/10.13140/RG.2.2.10613.38888

Graham, J., Meindl, P., Beall, E., Johnson, K. M., & Zhang, L. (2016). Cultural differences in moral judgment and behavior, across and within societies. Current Opinion in Psychology, 8, 125–130. https://doi.org/10.1016/j.copsyc.2015.09.007

Ji, J., Qiu, T., Chen, B., Zhang, B., Lou, H., Wang, K., Duan, Y., He, Z., Vierling, L., Hong, D., Zhou, J., Zhang, Z., Zeng, F., Dai, J., Pan, X., Ng, K. Y., O’Gara, A., Xu, H., Tse, B., … Gao, W. (2025). AI Alignment: A Comprehensive Survey (No. arXiv:2310.19852). arXiv. https://doi.org/10.48550/arXiv.2310.19852

World Values Survey. (n.d.). WVS Database. Retrieved September 29, 2025, from https://www.worldvaluessurvey.org/WVSContents.jsp



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI对齐 人工智能伦理 价值观 文化多样性 AI偏见 AI alignment AI ethics values cultural diversity AI bias
相关文章