热点
"Jailbreaking" 相关文章
Why Safety Constraints in LLMs Are Easily Breakable? Knowledge as a Network of Gated Circuits
少点错误 2025-11-05T06:54:58.000000Z
GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash
少点错误 2025-11-04T16:36:28.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心 2025-10-14T10:40:18.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
36kr-科技 2025-10-14T10:09:30.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
36kr-科技 2025-10-14T10:09:30.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心 2025-10-14T06:54:22.000000Z
Low-resourced languages get jailbroken more. Can SAEs explain why?
少点错误 2025-09-16T06:03:19.000000Z
28种LLM越狱攻击方法汇总(2025.8)
安小圈 2025-09-12T03:41:02.000000Z
阿联酋的 K2 Think AI 通过其自身的透明度功能被越狱
HackerNews 2025-09-12T03:28:55.000000Z
EMNLP 2025 | 看图就越狱!视觉上下文攻击:“图像语境”一招撬开多模态大模型
PaperWeekly 2025-09-01T15:01:23.000000Z
19 亿美元的 91 助手死了,但「手机助手」已经秽土转生
36氪 AI 2025-09-01T12:25:56.000000Z
GPT正面对决Claude,OpenAI竟没全赢,AI安全「极限大测」真相曝光
36氪 - 科技频道 2025-08-29T02:54:54.000000Z
GPT正面对决Claude,OpenAI竟没全赢,AI安全「极限大测」真相曝光
36氪 - 科技频道 2025-08-29T02:54:54.000000Z
GPT-5多模型路由可被誘導降級,研究指ChatGPT安全風險升高
AI & Big Data 2025-08-26T09:52:28.000000Z