热点
"AI评估" 相关文章
Sonnet 4.5's eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals
少点错误 2025-10-30T15:51:30.000000Z
AGI有了「权威」新定义!图灵奖得主Yoshua Bengio等提出,GPT-5仅达57%
智源社区 2025-10-30T11:59:09.000000Z
AGI有了「权威」新定义,图灵奖得主Yoshua Bengio等提出,GPT-5仅达57%
36kr-科技 2025-10-29T10:18:40.000000Z
从「会画画」到「会思考」:快手可灵团队提出 T2I-CoReBench,最强模型也难逃推理瓶颈
我爱计算机视觉 2025-10-25T08:56:32.000000Z
Towards Trustworthy Enterprise Deep Research
Salesforce Blog AI Research 2025-10-24T23:04:26.000000Z
Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
cs.AI updates on arXiv.org 2025-10-24T04:22:26.000000Z
Modeling Human Beliefs about AI Behavior for Scalable Oversight
cs.AI updates on arXiv.org 2025-10-22T04:26:32.000000Z
A Definition of AGI
cs.AI updates on arXiv.org 2025-10-22T04:12:56.000000Z
Instagram cofounder rips ‘AI FOMO’ that caused a rush to adopt and no metrics: ‘When it gets fuzzy, it’s very hard to then evaluate’
Fortune | FORTUNE 2025-10-21T17:20:48.000000Z
大模型也要接受同行评议
南方周末 2025-10-19T01:29:11.000000Z
Bengio推AGI「高考」,GPT-5单项0分
新智元 2025-10-17T16:17:19.000000Z
Stop Measuring AI Like Software
Communications of the ACM - Artificial Intelligence 2025-10-17T14:49:19.000000Z
Stop Measuring AI Like Software
Communications of the ACM - Artificial Intelligence 2025-10-17T14:49:19.000000Z
按照Bengio等大佬的AGI新定义,GPT-5才实现了不到10%
机器之心 2025-10-17T13:34:40.000000Z
按照Bengio等大佬的AGI新定义,GPT-5才实现了不到10%
机器之心 2025-10-17T13:34:40.000000Z
让 AI 学会“灵魂拷问”:我们如何教机器评判生成视频 | ICCV 2025
AI科技评论 2025-10-17T11:58:31.000000Z
按照Bengio等大佬的AGI新定义,GPT-5才实现了不到10%
36氪 - AI相关文章 2025-10-17T09:43:58.000000Z
按照Bengio等大佬的AGI新定义,GPT-5才实现了不到10%
机器之心 2025-10-17T05:40:05.000000Z
MetaBench: A Multi-task Benchmark for Assessing LLMs in Metabolomics
cs.AI updates on arXiv.org 2025-10-17T04:19:12.000000Z
Braintrust on the Vercel Marketplace
Braintrust Blog 2025-10-16T16:48:57.000000Z