评估标准_Fishai

热点

"评估标准" 相关文章

Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices

cs.AI updates on arXiv.org 2025-10-29T04:28:32.000000Z

全国首部AI智能体应用评估标准，现公开征集起草单位和个人！

PaperAgent 2025-10-27T09:30:06.000000Z

[分享发现] 昨天在买车热帖里面好像看到了一个是否要买车的金标准

V2EX 2025-10-21T02:43:08.000000Z

ColorBench: Benchmarking Mobile Agents with Graph-Structured Framework for Complex Long-Horizon Tasks

cs.AI updates on arXiv.org 2025-10-17T04:09:49.000000Z

Online Rubrics Elicitation from Pairwise Comparisons

cs.AI updates on arXiv.org 2025-10-09T04:13:41.000000Z

Walking ’on Eggshells’: Corporate Boards Juggle Many Intangibles When Judging Performance

Knowledge at Wharton 2025-09-29T04:02:25.000000Z

Can AI really code? Study maps the roadblocks to autonomous software engineering

MIT News - Computer Science and Artificial Intelligence Laboratory 2025-09-25T10:00:59.000000Z

OpenAI 研究人员宣称已破解模型“幻觉”难题：现有评估方式在鼓励 AI“瞎蒙”

IT之家 2025-09-06T08:11:44.000000Z

LOTUS: A Leaderboard for Detailed Image Captioning from Quality to Societal Bias and User Preferences

cs.AI updates on arXiv.org 2025-07-28T04:42:59.000000Z

FCC to eliminate gigabit speed goal and scrap analysis of broadband prices

Ars Technica - All content 2025-07-21T19:56:46.000000Z

State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]

少点错误 2025-04-30T20:02:28.000000Z

Making progress bars for Alignment

少点错误 2025-01-03T21:33:10.000000Z

OpenAI o3 被曝智商高达 157，比肩爱因斯坦，但却没法证明比人类聪明

APPSO 2024-12-25T15:08:58.000000Z

How to make evals for the AISI evals bounty

少点错误 2024-12-03T10:50:09.000000Z

OpenAI 与 Anthropic 首席产品官对谈：AI 时代产品经理的核心技能是写评估｜Z Talk

真格基金 2024-11-20T12:33:18.000000Z

监管发布券商数字化成熟度标准，又到考验IT基建时刻，IT 投入前十占优

深度财经头条 2024-10-21T06:06:00.000000Z

This AI Paper from Centre for the Governance of AI Proposes a Grading Rubric for AI Safety Frameworks

MarkTechPost@AI 2024-09-19T10:05:33.000000Z

Copyright © 2019 FISHAI.All Rights Reserved