热点
关于我们
xx
xx
"
Jailbreaking
" 相关文章
Why Safety Constraints in LLMs Are Easily Breakable? Knowledge as a Network of Gated Circuits
少点错误
2025-11-05T06:54:58.000000Z
GDM: Consistency Training Helps Limit Sycophancy and Jailbreaks in Gemini 2.5 Flash
少点错误
2025-11-04T16:36:28.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心
2025-10-14T10:40:18.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
36kr-科技
2025-10-14T10:09:30.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
36kr-科技
2025-10-14T10:09:30.000000Z
OpenAI、Anthropic、DeepMind联手发文:现有LLM安全防御不堪一击
机器之心
2025-10-14T06:54:22.000000Z
Low-resourced languages get jailbroken more. Can SAEs explain why?
少点错误
2025-09-16T06:03:19.000000Z
28种LLM越狱攻击方法汇总(2025.8)
安小圈
2025-09-12T03:41:02.000000Z
阿联酋的 K2 Think AI 通过其自身的透明度功能被越狱
HackerNews
2025-09-12T03:28:55.000000Z
EMNLP 2025 | 看图就越狱!视觉上下文攻击:“图像语境”一招撬开多模态大模型
PaperWeekly
2025-09-01T15:01:23.000000Z
19 亿美元的 91 助手死了,但「手机助手」已经秽土转生
36氪 AI
2025-09-01T12:25:56.000000Z
GPT正面对决Claude,OpenAI竟没全赢,AI安全「极限大测」真相曝光
36氪 - 科技频道
2025-08-29T02:54:54.000000Z
GPT正面对决Claude,OpenAI竟没全赢,AI安全「极限大测」真相曝光
36氪 - 科技频道
2025-08-29T02:54:54.000000Z
GPT-5多模型路由可被誘導降級,研究指ChatGPT安全風險升高
AI & Big Data
2025-08-26T09:52:28.000000Z