热点
"攻击成功率" 相关文章
Differentiated Directional Intervention A Framework for Evading LLM Safety Alignment
cs.AI updates on arXiv.org 2025-11-12T05:21:36.000000Z
HarmNet: A Framework for Adaptive Multi-Turn Jailbreak Attacks on Large Language Models
cs.AI updates on arXiv.org 2025-10-22T04:25:01.000000Z
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
cs.AI updates on arXiv.org 2025-10-22T04:18:27.000000Z
Imperceptible Jailbreaking against Large Language Models
cs.AI updates on arXiv.org 2025-10-07T04:18:06.000000Z
NEXUS: Network Exploration for eXploiting Unsafe Sequences in Multi-Turn LLM Jailbreaks
cs.AI updates on arXiv.org 2025-10-07T04:14:46.000000Z
Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks
cs.AI updates on arXiv.org 2025-10-03T04:13:52.000000Z
Dagger Behind Smile: Fool LLMs with a Happy Ending Story
cs.AI updates on arXiv.org 2025-10-01T06:02:30.000000Z
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
cs.AI updates on arXiv.org 2025-09-30T04:06:32.000000Z
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
cs.AI updates on arXiv.org 2025-09-29T04:16:05.000000Z
A Set of Generalized Components to Achieve Effective Poison-only Clean-label Backdoor Attacks with Collaborative Sample Selection and Triggers
cs.AI updates on arXiv.org 2025-09-25T05:54:39.000000Z
Defending LVLMs Against Vision Attacks through Partial-Perception Supervision
cs.AI updates on arXiv.org 2025-09-05T04:45:49.000000Z
The Cost of Thinking: Increased Jailbreak Risk in Large Language Models
cs.AI updates on arXiv.org 2025-08-15T04:18:35.000000Z
LLM Robustness Leaderboard v1 --Technical report
cs.AI updates on arXiv.org 2025-08-11T04:08:20.000000Z
PromptArmor: Simple yet Effective Prompt Injection Defenses
cs.AI updates on arXiv.org 2025-07-22T04:44:55.000000Z
Breaking the Illusion of Security via Interpretation: Interpretable Vision Transformer Systems under Attack
cs.AI updates on arXiv.org 2025-07-22T04:34:48.000000Z
AdvDGMs: Enhancing Adversarial Robustness in Tabular Machine Learning by Incorporating Constraint Repair Layers for Realistic and Domain-Specific Attack Generation
MarkTechPost@AI 2024-09-25T10:20:46.000000Z
AI大模型新型噪声攻击曝光,可绕过最先进的后门检测
FreeBuf互联网安全新媒体平台 2024-09-11T03:53:21.000000Z