热点
关于我们
xx
xx
"
模型监控
" 相关文章
AI Safety at the Frontier: Paper Highlights of October 2025
少点错误
2025-11-05T13:49:15.000000Z
Open-weight training practices and implications for CoT monitorability
少点错误
2025-11-04T11:20:45.000000Z
Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity
cs.AI updates on arXiv.org
2025-11-03T05:19:34.000000Z
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
cs.AI updates on arXiv.org
2025-10-24T04:19:55.000000Z
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
cs.AI updates on arXiv.org
2025-10-24T04:19:55.000000Z
The Ends Justify the Thoughts: RL-Induced Motivated Reasoning in LLMs
cs.AI updates on arXiv.org
2025-10-21T04:27:45.000000Z
Braintrust on the Vercel Marketplace
Braintrust Blog
2025-10-16T16:48:57.000000Z
Training fails to elicit subtle reasoning in current language models
少点错误
2025-10-09T19:17:44.000000Z
Training fails to elicit subtle reasoning in current language models
少点错误
2025-10-09T19:17:44.000000Z
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
cs.AI updates on arXiv.org
2025-10-03T04:03:56.000000Z
Prompt optimization can enable AI control research
少点错误
2025-09-23T13:23:03.000000Z
18 Applications of Deception Probes
少点错误
2025-08-28T19:05:32.000000Z
Anthropic 推出 Usage and Cost API
oschina.net
2025-08-20T10:10:08.000000Z
Trae可视化工具:实时监控训练过程
掘金 人工智能
2025-08-13T11:11:27.000000Z
CI/CD与模型监控平台集成MLOps系统实现的全面路径
掘金 人工智能
2025-07-28T03:23:22.000000Z
Trusted monitoring, but with deception probes.
少点错误
2025-07-23T05:31:06.000000Z
Vulnerability in Trusted Monitoring and Mitigations
少点错误
2025-06-11T21:17:32.000000Z
OpenAI的最新AI模型拥有新的保护措施来预防生物风险
Cnbeta
2025-04-16T22:02:50.000000Z
OpenAI’s latest AI models have a new safeguard to prevent biorisks
TechCrunch News
2025-04-16T21:21:21.000000Z
奥特曼惊呼奇点临近!95%人类饭碗将被AI抢走,2028年百万AI上岗
智源社区
2025-01-06T05:22:06.000000Z