热点
"model monitoring" 相关文章
AI Safety at the Frontier: Paper Highlights of October 2025
少点错误 2025-11-05T13:49:15.000000Z
Open-weight training practices and implications for CoT monitorability
少点错误 2025-11-04T11:20:45.000000Z
Braintrust on the Vercel Marketplace
Braintrust Blog 2025-10-16T16:48:57.000000Z
Training fails to elicit subtle reasoning in current language models
少点错误 2025-10-09T19:17:44.000000Z
Prompt optimization can enable AI control research
少点错误 2025-09-23T13:23:03.000000Z
18 Applications of Deception Probes
少点错误 2025-08-28T19:05:32.000000Z