热点
"AI 评估" 相关文章
按照Bengio等大佬的AGI新定义,GPT-5才实现了不到10%
机器之心 2025-10-17T05:40:05.000000Z
How Dropbox automates evals for conversational AI
Braintrust Blog 2025-10-15T22:39:14.000000Z
Can AI automate computational reproducibility?
AI Snake Oil 2025-09-11T18:40:25.000000Z
A high schooler built a website that lets you challenge AI models to a Minecraft build-off
TechCrunch News 2025-03-20T20:18:14.000000Z
Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge
Latent 2024-10-22T02:56:29.000000Z