热点
关于我们
xx
xx
"
AI 评估
" 相关文章
按照Bengio等大佬的AGI新定义,GPT-5才实现了不到10%
机器之心
2025-10-17T05:40:05.000000Z
How Dropbox automates evals for conversational AI
Braintrust Blog
2025-10-15T22:39:14.000000Z
Can AI automate computational reproducibility?
AI Snake Oil
2025-09-11T18:40:25.000000Z
A high schooler built a website that lets you challenge AI models to a Minecraft build-off
TechCrunch News
2025-03-20T20:18:14.000000Z
Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge
Latent
2024-10-22T02:56:29.000000Z