热点
"评估标准" 相关文章
Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices
cs.AI updates on arXiv.org 2025-10-29T04:28:32.000000Z
全国首部AI智能体应用评估标准,现公开征集起草单位和个人!
PaperAgent 2025-10-27T09:30:06.000000Z
[分享发现] 昨天在买车热帖里面好像看到了一个是否要买车的金标准
V2EX 2025-10-21T02:43:08.000000Z
ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks
cs.AI updates on arXiv.org 2025-10-17T04:09:49.000000Z
Online Rubrics Elicitation from Pairwise Comparisons
cs.AI updates on arXiv.org 2025-10-09T04:13:41.000000Z
Walking ’on Eggshells’: Corporate Boards Juggle Many Intangibles When Judging Performance
Knowledge at Wharton 2025-09-29T04:02:25.000000Z
Can AI really code? Study maps the roadblocks to autonomous software engineering
MIT News - Computer Science and Artificial Intelligence Laboratory 2025-09-25T10:00:59.000000Z
OpenAI 研究人员宣称已破解模型“幻觉”难题:现有评估方式在鼓励 AI“瞎蒙”
IT之家 2025-09-06T08:11:44.000000Z
LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences
cs.AI updates on arXiv.org 2025-07-28T04:42:59.000000Z
FCC to eliminate gigabit speed goal and scrap analysis of broadband prices
Ars Technica - All content 2025-07-21T19:56:46.000000Z
State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
少点错误 2025-04-30T20:02:28.000000Z
Making progress bars for Alignment
少点错误 2025-01-03T21:33:10.000000Z
OpenAI o3 被曝智商高达 157,比肩爱因斯坦,但却没法证明比人类聪明
APPSO 2024-12-25T15:08:58.000000Z
How to make evals for the AISI evals bounty
少点错误 2024-12-03T10:50:09.000000Z
OpenAI 与 Anthropic 首席产品官对谈:AI 时代产品经理的核心技能是写评估|Z Talk
真格基金 2024-11-20T12:33:18.000000Z
监管发布券商数字化成熟度标准,又到考验IT基建时刻,IT 投入前十占优
深度财经头条 2024-10-21T06:06:00.000000Z
This AI Paper from Centre for the Governance of AI Proposes a Grading Rubric for AI Safety Frameworks
MarkTechPost@AI 2024-09-19T10:05:33.000000Z