热点
"Alignment" 相关文章
ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents
少点错误 2025-10-30T03:15:41.000000Z
RL记得更牢,SFT更健忘?普林斯顿陈丹琦团队改写后训练认知
PaperWeekly 2025-10-27T13:26:49.000000Z
⿻ Symbiogenesis vs. Convergent Consequentialism
少点错误 2025-10-21T11:46:14.000000Z
Will AI superintelligence kill us all? (with Nate Soares)
Clearer Thinking with Spencer Greenberg 2025-10-16T04:21:18.000000Z
LLMs one-box when in a "hostile telepath" version of Newcomb's Paradox, except for the one that beat the predictor
少点错误 2025-10-06T08:52:21.000000Z
How to Build an Advanced Voice AI Pipeline with WhisperX for Transcription, Alignment, Analysis, and Export?
MarkTechPost@AI 2025-10-03T04:09:00.000000Z
大模型“精细化”对齐,真实性提升25.8%刷新SOTA!token级精准编辑,无需训练即插即用
量子位 2025-09-27T11:42:07.000000Z
从2017到2024,(前) OpenAI研究员的AI观点变迁史:起落起起落落落…
ShowMeAI 2025-09-25T10:01:10.000000Z
当AI学会欺骗,我们该如何应对?
腾讯研究院 2025-09-18T07:23:21.000000Z
大模型微调框架之TRL
掘金 人工智能 2025-09-13T18:31:13.000000Z
播客推荐 | The New Code — Sean Grove, OpenAI
孔某人的低维认知 2025-09-11T15:44:27.000000Z
Profanity causes emergent misalignment, but with qualitatively different results than insecure code
少点错误 2025-08-28T08:47:23.000000Z
RWKV state tuning 微调训练教程
RWKV元始智能 2024-10-28T00:09:59.000000Z