热点
"Misalignment" 相关文章
当AI学会伪装、背叛与协作
腾讯研究院 2025-10-17T10:23:04.000000Z
Anthropic開源AI模型安全稽核框架Petri
AI & Big Data 2025-10-08T08:58:04.000000Z
Profanity causes emergent misalignment, but with qualitatively different results than insecure code
少点错误 2025-08-28T08:47:23.000000Z
Harmless reward hacks can generalize to misalignment in LLMs
少点错误 2025-08-26T17:45:27.000000Z