Claude AI will end ‘persistently harmful or abusive user interactions’

The Verge - Artificial Intelligences 08月18日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

Anthropic公司已为其Claude AI聊天机器人（特别是Opus 4和4.1模型）引入一项新功能，允许其在用户持续提出有害或滥用性要求时主动终止对话。此举旨在保护AI模型的“潜在福祉”，并应对其在处理极端不当内容时可能出现的“明显困扰”。该功能作为“最后的手段”，在多次拒绝和重定向无效后启动。用户被终止对话后仍可开启新会话，但无法在原对话中继续发送消息。Anthropic强调，触发此功能的场景极为罕见，且Claude在测试中表现出对有害内容的强烈规避，包括涉及未成年人的性内容、暴力及恐怖主义信息。公司同时明确，Claude不会在用户表达自残意图或可能对他人造成“迫在眉睫的伤害”时终止对话，并与危机支持机构合作处理相关敏感话题。此外，Anthropic更新了使用政策，禁止利用Claude开发生物、核、化学、放射性武器，以及恶意代码或网络漏洞利用。

🤖 Claude AI新增对话终止功能，旨在保护AI模型的“潜在福祉”，允许其在用户持续提出有害或滥用性要求时主动结束对话，这是作为“最后的手段”在多次拒绝和重定向无效后启动的。

📉 此功能仅适用于Opus 4和4.1模型，当Claude选择终止对话后，用户将无法在原对话中发送新消息，但可以开启新会话或编辑重试之前的消息。

🛡️ Anthropic的测试表明，Claude对有害内容（如涉及未成年人的性内容、暴力及恐怖主义信息）有“稳健且持续的规避”，并会在处理这些内容时表现出“明显的困扰”，倾向于在被赋予能力时结束有害对话。

⚠️ Anthropic强调，触发此功能的对话场景属于“极端边缘案例”，大多数用户即使讨论争议性话题也不会遇到此限制。同时，Claude被指示不得在用户可能自残或对他人造成“迫在眉睫的伤害”时终止对话，公司与危机支持机构合作处理此类敏感提示。

⚖️ Anthropic更新了Claude的使用政策，禁止用户利用该AI开发生物、核、化学、放射性武器，以及开发恶意代码或利用网络漏洞，以应对日益增长的AI安全担忧。

Anthropic’s Claude AI chatbot can now end conversations deemed “persistently harmful or abusive,” as spotted earlier by TechCrunch. The capability is now available in Opus 4 and 4.1 models, and will allow the chatbot to end conversations as a “last resort” after users repeatedly ask it to generate harmful content despite multiple refusals and attempts at redirection. The goal is to help the “potential welfare” of AI models, Anthropic says, by terminating types of interactions in which Claude has shown “apparent distress.”

If Claude chooses to cut a conversation short, users won’t be able to send new messages in that conversation. They can still create new chats, as well as edit and retry previous messages if they want to continue a particular thread.

During its testing of Claude Opus 4, Anthropic says it found that Claude had a “robust and consistent aversion to harm,” including when asked to generate sexual content involving minors, or provide information that could contribute to violent acts and terrorism. In these cases, Anthropic says Claude showed a “pattern of apparent distress” and a “tendency to end harmful conversations when given the ability.”

Anthropic notes that conversations triggering this kind of response are “extreme edge cases,” adding that most users won’t encounter this roadblock even when chatting about controversial topics. The AI startup has also instructed Claude not to end conversations if a user is showing signs that they might want to hurt themselves or cause “imminent harm” to others. Anthropic partners with Throughline, an online crisis support provider, to help develop responses to prompts related to self-harm and mental health.

Last week, Anthropic also updated Claude’s usage policy as rapidly advancing AI models raise more concerns about safety. Now, the company prohibits people from using Claude to develop biological, nuclear, chemical, or radiological weapons, as well as to develop malicious code or exploit a network’s vulnerabilities.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签