热点
关于我们
xx
xx
"
越狱攻击
" 相关文章
Google AI Introduces Consistency Training for Safer Language Models Under Sycophantic and Jailbreak Style Prompts
MarkTechPost@AI
2025-11-05T15:49:59.000000Z
AI越会思考,越容易被骗?「思维链劫持」攻击成功率超过90%
机器之心
2025-11-03T13:26:17.000000Z
AI越会思考,越容易被骗?「思维链劫持」攻击成功率超过90%
36kr-科技
2025-11-03T11:15:54.000000Z
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
cs.AI updates on arXiv.org
2025-10-30T04:12:40.000000Z
可攻可防,越狱成功率近90%!六大主流模型全中招 | EMNLP'25
智源社区
2025-10-27T17:39:43.000000Z
可攻可防,越狱成功率近90%!六大主流模型全中招 | EMNLP'25
新智元
2025-10-26T15:37:10.000000Z
AI黑化如恶魔附体!LARGO攻心三步,潜意识种子瞬间开花 | NeurIPS 2025
新智元
2025-10-26T15:31:54.000000Z
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
cs.AI updates on arXiv.org
2025-10-20T04:13:59.000000Z
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Complex Ciphers
cs.AI updates on arXiv.org
2025-10-07T04:18:41.000000Z
Imperceptible Jailbreaking against Large Language Models
cs.AI updates on arXiv.org
2025-10-07T04:18:06.000000Z
Imperceptible Jailbreaking against Large Language Models
cs.AI updates on arXiv.org
2025-10-07T04:18:06.000000Z
SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models
cs.AI updates on arXiv.org
2025-10-01T05:59:20.000000Z
Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
cs.AI updates on arXiv.org
2025-09-30T04:03:12.000000Z
Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
cs.AI updates on arXiv.org
2025-09-30T04:01:45.000000Z
从 0 到 1 了解大模型安全,看这篇就够了
财猫 AI
2025-09-25T10:02:38.000000Z
AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software
cs.AI updates on arXiv.org
2025-09-23T05:50:13.000000Z
Many-shot jailbreaking
Newsroom Anthropic
2025-09-13T01:28:18.000000Z
Expanding our model safety bug bounty program
Newsroom Anthropic
2025-09-13T01:26:11.000000Z
28种LLM越狱攻击方法汇总(2025.8)
安小圈
2025-09-12T03:41:02.000000Z
成果分享|[WWW'25]安全防护让大语言模型变“傻”了吗?
复旦白泽战队
2025-09-11T20:12:14.000000Z