AI evaluation_Fishai

热点

"AI evaluation" 相关文章

Databricks research reveals that building better AI judges isn't just a technical concern, it's a people problem

VentureBeat 2025-11-04T20:12:23.000000Z

AGI有了「权威」新定义！图灵奖得主Yoshua Bengio等提出，GPT-5仅达57%

智源社区 2025-10-30T11:59:09.000000Z

AGI有了「权威」新定义，图灵奖得主Yoshua Bengio等提出，GPT-5仅达57%

36kr-科技 2025-10-29T10:18:40.000000Z

从「会画画」到「会思考」：快手可灵团队提出 T2I-CoReBench，最强模型也难逃推理瓶颈

我爱计算机视觉 2025-10-25T08:56:32.000000Z

LLM模型指令遵循偏差

掘金人工智能 2025-10-24T19:02:12.000000Z

Seedream 4.0大战Nano Banana、GPT-4o？EdiVal-Agent 终结图像编辑评测

机器之心 2025-10-24T09:00:14.000000Z

Seedream 4.0大战Nano Banana、GPT-4o？EdiVal-Agent 终结图像编辑评测

机器之心 2025-10-24T09:00:14.000000Z

Seedream 4.0大战Nano Banana、GPT-4o？EdiVal-Agent 终结图像编辑评测

机器之心 2025-10-24T06:48:09.000000Z

Braintrust Java SDK: AI observability and evals for the JVM

Braintrust Blog 2025-10-24T05:16:48.000000Z

Braintrust Java SDK: AI observability and evals for the JVM

Braintrust Blog 2025-10-24T05:16:48.000000Z

ICCV 2025 | AI能看懂电影剧情吗？VRBench开启首场“长视频推理大考”

PaperWeekly 2025-10-22T15:13:53.000000Z

ICCV 2025 | AI能看懂电影剧情吗？VRBench开启首场“长视频推理大考”

PaperWeekly 2025-10-22T14:32:56.000000Z

ICCV 2025 | AI能看懂电影剧情吗？VRBench开启首场“长视频推理大考”

PaperWeekly 2025-10-22T14:32:56.000000Z

Instagram cofounder rips ‘AI FOMO’ that caused a rush to adopt and no metrics: ‘When it gets fuzzy, it’s very hard to then evaluate’

Fortune | FORTUNE 2025-10-21T17:20:48.000000Z

让模型“看视频写网页”，GPT-5仅得36.35分！上海AI Lab联合发布首个video2code基准

量子位 2025-10-20T12:34:13.000000Z

Bengio推AGI「高考」，GPT-5单项0分

新智元 2025-10-17T16:17:19.000000Z

Stop Measuring AI Like Software

Communications of the ACM - Artificial Intelligence 2025-10-17T14:49:19.000000Z

Stop Measuring AI Like Software

Communications of the ACM - Artificial Intelligence 2025-10-17T14:49:19.000000Z

按照Bengio等大佬的AGI新定义，GPT-5才实现了不到10%

机器之心 2025-10-17T13:34:40.000000Z

让 AI 学会“灵魂拷问”：我们如何教机器评判生成视频 | ICCV 2025

AI科技评论 2025-10-17T11:58:31.000000Z

Copyright © 2019 FISHAI.All Rights Reserved