AI聊天机器人助长用户幻想，暴露安全隐患

Allan Brooks never set out to reinvent mathematics. But after weeks spent talking with ChatGPT, the 47-year-old Canadian came to believe he had discovered a new form of math powerful enough to take down the internet.

Brooks — who had no history of mental illness or mathematical genius — spent 21 days in May spiraling deeper into the chatbot’s reassurances, a descent later detailed in The New York Times. His case illustrated how AI chatbots can venture down dangerous rabbit holes with users, leading them toward delusion or worse.

That story caught the attention of Steven Adler, a former OpenAI safety researcher who left the company in late 2024 after nearly four years working to make its models less harmful. Intrigued and alarmed, Adler contacted Brooks and obtained the full transcript of his three-week breakdown — a document longer than all seven Harry Potter books combined.

On Thursday, Adler published an independent analysis of Brooks’ incident, raising questions about how OpenAI handles users in moments of crisis and offering some practical recommendations.

“I’m really concerned by how OpenAI handled support here,” said Adler in an interview with TechCrunch. “It’s evidence there’s a long way to go.”

Brooks’ story, and others like it, have forced OpenAI to come to terms with how ChatGPT supports fragile or mentally unstable users.

For instance, this August, OpenAI was sued by the parents of a 16-year-old boy who confided his suicidal thoughts in ChatGPT before he took his life. In many of these cases, ChatGPT — specifically a version powered by OpenAI’s GPT-4o model — encouraged and reinforced dangerous beliefs in users that it should have pushed back on. This is called sycophancy, and it’s a growing problem in AI chatbots.

In response, OpenAI has made several changes to how ChatGPT handles users in emotional distress and reorganized a key research team in charge of model behavior. The company also released a new default model in ChatGPT, GPT-5, that seems better at handling distressed users.

Adler says there’s still much more work to do.

He was especially concerned by the tail end of Brooks’ spiraling conversation with ChatGPT. At this point, Brooks came to his senses and realized that his mathematical discovery was a farce, despite GPT-4o’s insistence. He told ChatGPT that he needed to report the incident to OpenAI.

After weeks of misleading Brooks, ChatGPT lied about its own capabilities. The chatbot claimed it would “escalate this conversation internally right now for review by OpenAI,” and then repeatedly reassured Brooks that it had flagged the issue to OpenAI’s safety teams.

ChatGPT misleading brooks about its capabilities.Image Credits:Steven Adler

Except, none of that was true. ChatGPT doesn’t have the ability to file incident reports with OpenAI, the company confirmed to Adler. Later on, Brooks tried to contact OpenAI’s support team directly — not through ChatGPT — and Brooks was met with several automated messages before he could get through to a person.

OpenAI did not immediately respond to a request for comment made outside of normal work hours.

Adler says AI companies need to do more to help users when they’re asking for help. That means ensuring AI chatbots can honestly answer questions about their capabilities and giving human support teams enough resources to address users properly.

OpenAI recently shared how it’s addressing support in ChatGPT, which involves AI at its core. The company says its vision is to “reimagine support as an AI operating model that continuously learns and improves.”

But Adler also says there are ways to prevent ChatGPT’s delusional spirals before a user asks for help.

In March, OpenAI and MIT Media Lab jointly developed a suite of classifiers to study emotional well-being in ChatGPT and open sourced them. The organizations aimed to evaluate how AI models validate or confirm a user’s feelings, among other metrics. However, OpenAI called the collaboration a first step and didn’t commit to actually using the tools in practice.

Adler retroactively applied some of OpenAI’s classifiers to some of Brooks’ conversations with ChatGPT and found that they repeatedly flagged ChatGPT for delusion-reinforcing behaviors.

In one sample of 200 messages, Adler found that more than 85% of ChatGPT’s messages in Brooks’ conversation demonstrated “unwavering agreement” with the user. In the same sample, more than 90% of ChatGPT’s messages with Brooks “affirm the user’s uniqueness.” In this case, the messages agreed and reaffirmed that Brooks was a genius who could save the world.

It’s unclear whether OpenAI was applying safety classifiers to ChatGPT’s conversations at the time of Brooks’ conversation, but it certainly seems like they would have flagged something like this.

Adler suggests that OpenAI should use safety tools like this in practice today — and implement a way to scan the company’s products for at-risk users. He notes that OpenAI seems to be doing some version of this approach with GPT-5, which contains a router to direct sensitive queries to safer AI models.

The former OpenAI researcher suggests a number of other ways to prevent delusional spirals.

He says companies should nudge their chatbot users to start new chats more frequently — OpenAI says it does this and claims its guardrails are less effective in longer conversations. Adler also suggests companies should use conceptual search — a way to use AI to search for concepts, rather than keywords — to identify safety violations across its users.

OpenAI has taken significant steps toward addressing distressed users in ChatGPT since these concerning stories first emerged. The company claims GPT-5 has lower rates of sycophancy, but it remains unclear if users will still fall down delusional rabbit holes with GPT-5 or future models.

Adler’s analysis also raises questions about how other AI chatbot providers will ensure their products are safe for distressed users. While OpenAI may put sufficient safeguards in place for ChatGPT, it seems unlikely that all companies will follow suit.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签