热点
关于我们
xx
xx
"
Misalignment
" 相关文章
当AI学会伪装、背叛与协作
腾讯研究院
2025-10-17T10:23:04.000000Z
Anthropic開源AI模型安全稽核框架Petri
AI & Big Data
2025-10-08T08:58:04.000000Z
Profanity causes emergent misalignment, but with qualitatively different results than insecure code
少点错误
2025-08-28T08:47:23.000000Z
Harmless reward hacks can generalize to misalignment in LLMs
少点错误
2025-08-26T17:45:27.000000Z