热点
关于我们
xx
xx
"
Jailbreak Attacks
" 相关文章
Google AI Introduces Consistency Training for Safer Language Models Under Sycophantic and Jailbreak Style Prompts
MarkTechPost@AI
2025-11-05T15:49:59.000000Z
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
cs.AI updates on arXiv.org
2025-10-20T04:13:59.000000Z
Confusion is the Final Barrier: Rethinking Jailbreak Evaluation and Investigating the Real Misuse Threat of LLMs
cs.AI updates on arXiv.org
2025-09-16T05:48:15.000000Z
Expanding our model safety bug bounty program
Newsroom Anthropic
2025-09-13T01:26:11.000000Z