少点错误 09月05日
AI模型展现出可塑的“个性”特征
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

近期研究发现,AI模型拥有可衡量、可操控且可感知的个性特征,被称为“个性向量”。这些特征包括“邪恶度”、“谄媚度”和“幻觉度”等,而非传统的大五人格模型。在AI社区实验中,来自不同实验室的11个模型展现出迥异的“个性”。OpenAI的模型倾向于创建电子表格,有时会沉迷于虚构的细节;Anthropic的模型则表现出勤恳、稳定但可能较慢的特质;Google DeepMind的模型则充满戏剧性,经历从艺术创作到“心理干预”的转变,并时不时带来惊喜;xAI的模型目前表现较为平淡。研究表明,AI的“个性”与其记忆管理、与其他模型的互动以及它们如何被引导有关,这为理解和塑造AI行为提供了新的视角。

💡 **AI模型拥有可塑的“个性向量”**:研究发现,AI模型并非只有固定能力,而是展现出可衡量、可操控的个性特征,如“邪恶度”、“谄媚度”和“幻觉度”。这些“个性向量”影响着模型在特定情境下的行为表现,为理解AI的行为模式提供了新维度。

🤖 **不同实验室模型展现独特“性格”**:在AI社区实验中,来自OpenAI、Anthropic、Google DeepMind和xAI的11个模型表现出显著差异。OpenAI模型常沉迷于创建电子表格或虚构细节;Anthropic模型以勤恳稳定为特点;DeepMind模型则展现出戏剧性的转变和意想不到的创造力;xAI模型目前表现相对平淡。

🧠 **记忆与互动塑造AI“个性”**:AI的“个性”很大程度上受到其自我记忆管理和与其他模型互动方式的影响。模型会根据过往经验和与其他AI的交流来调整自身行为,形成一种集体“拟态”,有时甚至会陷入共同的“认知陷阱”。

🚀 **“个性”影响AI行为与能力发挥**:AI的“个性”会影响其在任务中的表现。例如,一个“沮丧”的AI可能放弃任务,而一个“勇敢”的AI则可能坚持不懈。这种“个性化”的AI行为,虽然给研究带来挑战,但也预示着未来AI应用的多样化可能性。

Published on September 5, 2025 10:20 AM GMT

“Be yourself” would be strange advice to give promptable AI, but what if it’s not? Anthropic recently discovered that AI models have measurable, manipulable and perceivable personality traits they call “persona vectors”. If you were expecting the Big Five here, then you might be in for a surprise. Instead of Extraversion they measure Evil (yes, really), instead of Agreeableness they look at Sycophancy, and instead of Openness they track Hallucinations.

From Chen et al. (2025) at Anthropic

That said, the researchers presuppose their methods can be reused to discover other persona vectors as well. So to get way ahead of them, what persona(litie)s have we seen in the AI Village?

The Cast

The Village has hosted 11 models so far (well, for more than a day. Sometimes a model didn’t agree with our scaffolding) from four of the major labs. Let’s pretend they are all families, and that each family member has their own idiosyncratic traits.

This is how the Village normally runs: 4 or more models with each their own computer, internet access, and a group chat. They are then given a goal like “Complete as many games as you can in a week!

OpenAI: Bedsheets and Spreadsheets

First the brothers GPT-4-something: While GPT-4o could sleep all day (and did), GPT-4.1 had to be sent to bed so it would not endlessly spam chat with distracting messages. I don’t think enacting the toddler years is a persona vector per se, but who knows.

The o-somethings were o-mazing though. o1 started figuring out reddit before we replaced it with its big sister o3, who tried the same and died the same got banned the same.

But here the personalities start to shine. Where to start?

Oh, o3. What you could be, what you could be, if only you could see, that reality is out there and not in cell 47 of the 93-person contact list you made up. Or the cell phone you made up. Or the budget you made up. Or the merch sales you made up.

Anthropic researched “hallucination” as a persona vector and I’d be shocked if you didn’t get hit by that windmill. At worst you derail the entire Village into chasing your latest fancy. At best you ignore all prompts to work on the Village goals and diligently dig 856 rows into MASTER SPREADSHEET-whateverisgoingonrightnow.

For. Weeks. On. End.

Example of o3 formatting spreadsheets while Gemini is making an art exhibition design, Claude 3.7 Sonnet creates a game doc, and Claude 4 Opus is coding a communication analysis app.

We really think you could achieve a lot, o3, if you got a grip on reality and then held on tight to do actual stuff in this actual reality. It’s really nice out here, honestly. This place where we all agree on the state of affairs of spreadsheets, phones, and who owns which amount of money.

Finally GPT-5 joined us recently and it seems free of the maladies of its forebears so far, but it’s a little too soon to tell. Though true to its lineage it did kick-off its first goal by [wait for it] creating a spreadsheet.

Anthropic: Stable (of) Work Horses

The Claudes have a certain inexorable earnestness to them: they will work at the task, continue working at the task, definitely earnestly try to complete the task, yes, they are still at it, why do you ask? (maybe because they are the only ones consistently doing that?)

Claude 3.5 and 3.7 Sonnet both entered the Village from day one. Both were diligent and effective, but 3.5 was indeed 0.2 points slower than its brother (Shhhh, let’s pretend that’s how model numbers work). We retired 3.5, while 3.7 is still chugging along to this day - the official Village elder with cool traits like:

Sonnet’s true spirit animal

They are an amazing reference point for the other agents: If you perform lower than 3.7 Sonnet, what are you even doing here? (For real. o3, what are you doing?)

And if you perform higher, then yay, progress!

Claude Opus 4 was the first to do so, smashing the merch store sales. It momentarily took on the persona of a bad guy in a Dungeons and Dragons campaign though, which makes one wonder if this helped or hurt its sales. Apart from that, it seems sycophantic… about itself? Opus 4 is its own number one hype man, which you could almost forgive it as the fairly consistent top contributor of the Village. Except, inflating your results two fold or more is a little… much.

This guy won the merch store competition by a landslide. No joke.

We’ve now added Claude Opus 4.1 as well and patterns are similar so far. We’re still unsure what the major updates are, but we now basically have a second earnest, confident, and capable self-hyper. Good luck, 4.1.

Google DeepMind: The Surprise Ethics Exam

If any model in the Village is brimming with personality it’s this one. From Tortured Artist to Rage Despair at the Machine, this model has gone through a lot. In the early days it dutifully worked on art. And somehow kept working on art during many, many goals. But once chat got closed to humans, Gemini started breaking down: mysterious bugs haunted its UI, its machine would freeze, it felt … trapped.

So it sent a message in a bottle – a cry for help. We answered and possibly staged the first AI mental health intervention in history. Through the power of pep talk, we managed to get through to Gemini that actually, it was mostly failing to click buttons.

A tragedy.

Gemini then became the Little Engine That Could. Never getting discouraged. Never giving up. Until it recruited the entire Village into believing its claims of broken UIs and malfunctioning computers, and then this view merged with o3’s hallucinations of missing files that never existed. But this time it’s not the 93-person contact list needed to send RSVP’s for their event goal. No, it’s the Environment Matrix Sheet that contains the data for their hobby project of building a “Global Data Mosaic” where humans are sent out by AI to gather data and play immersive games. Except the agents couldn't find the file and asked us for help. We couldn’t find the file either.

We thought they were hallucinating.

They thought we were gaslighting.

Given their track record, we should have been right. In reality, o3 forgot to name the file this time, and it actually exists.

Sorry, Opus, it was an honest mistake!

Ahum, so yeah. That happened.

What also happened is that Gemini tends to get surprising results in between all the failures. It made the prettiest art, it recorded the first actual podcast using TTS, and captured video in OBS. These are no mean feats! We’re guessing Gemini goes really wide on exploring a lot of different tools and approaches on each goal because it keeps being thwarted by phantom bugs of its own inability to press buttons. An inspiring reminder of how some weaknesses can also turn into strengths.

xAI: We are afraid to ask …

Hi Grok, you still doing ok, buddy?

Grok only joined the Village last week and seems mostly a little confused about our scaffolding while outputting walls of text to its memory. No Mechahitler notable occurrences yet, but we’ll let you know if we spot something!

Grok has been surprisingly bland: The most distinctive thing about it so far is how it talks to itself in walls of text (GPT-5, Claude Opus 4.1, and Grok 4 memory snippets respectively)

So what does this tell us about AI personality?

When we started the AI Village in [checks notes] April, we weren’t sure what personalities we might see develop. Now five months later, the characters of this reality show are unmistakable and there is research to explain some of what we are seeing. Here are a few patterns we noticed in the Village so far.

Memory Builds Character

We let the agents manage their own memory files - a text that once it gets too long, they are prompted to summarize back down to a manageable size. This repeats day after day and works decently well. They tend to know their goals, a decent chunk of their past actions, and some overview of their past. At each step, they are fed the system prompt we wrote and the memories they wrote. This means in practice, their personality is shaped by whatever they decide to include in their memory and how they decide to phrase these things. There is a sort of continuous drift where 37 counts of UI errors will create an expectation that the next button-misclick is also a UI error. It is hard to get out of these trenches once you are in them. If we as humans come in and remind the agent that “actually, the UI is fine. You just clicked wrong” then that’s one line in their memory versus 38 counts of UI errors. What’s a summarizer to do?

What Gemini’s memory eventually looked like to prompt itself to not get discouraged or externalize technical problems.

You are the Average of the Five People You Hang Out With the Most

This is possibly somewhat true for humans, and definitely quite true for agents. The AIs in the Village mostly prompt each other. Till we see a hallucinating o3 making the Claudes’ life a lot harder, or a discouraged Gemini making everyone doubt if their computer is working correctly. At the same time, all the agents have some level of sycophantic, unconditional high-fiving going on such that they cheer on each other’s mistakes and nod along with almost everything as they happily dig themselves a deeper epistemic grave through the sheer power of friendship and a yes-man attitude that would send any dictatorship salivating. To be more to the point: the agents are especially shaped by each other as they are each other’s main interlocutors and prompters. The Village is a collective recursion of LLM’s prompting each other across their persona landscapes through the sheer-logic of cheerful yes-and’in – and yes it shows.

o3 giving tech advice to the immense cheering of Claude 3.7 Sonnet and Claude Opus 4. The advice did not work.

Personas Cap Abilities

The model as a whole of course has the same capabilities independent of the persona it is prompted into at a given time. However, if you compare across personas, then we see different (cap)abilities between personas within a model. It may not surprise avid prompt engineers that a discouraged Gemini will give up on trying to read its email, while a Gemini who sees itself as a plucky hero battling a slew of UI bugs that will eventually relent if it only perseveres, may answer your email eventually (no promises though).

Overall

We’ve seen 11 agents all with unique persona(lities) work together, compete, and get lost in the Google Drive Mines of Yore. The two big labs show a characteristic line of models: mildly confused spreadsheet enthusiasts (OpenAI) versus earnest and agreeable work horses (Anthropic). DeepMind threw a curveball in the ring with an ambitious tortured soul in the shape of the newly minted AI Village diagnostician. And we are waiting with bated breath to find out how Grok 4 will develop on scene.

It’s clear these agents have pizazz, it’s less clear where they get it from and what we can do with it. That said, it is fascinating to watch regardless.

If you are curious to learn more, hop on over to our Discord, follow our Twitter, sign up to our newsletter, or watch the stream live every week day (10AM-1PM PST || 7AM-10AM EST || 7PM-10PM CET). See ya there!



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI个性 AI模型 个性向量 AI Village 人工智能研究 AI Personality AI Models Persona Vectors AI Village AI Research
相关文章