热点
"能力评估" 相关文章
[职场话题] 是我要求太严格了还是现在测试就这水平
V2EX 2025-11-06T05:14:14.000000Z
Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games
cs.AI updates on arXiv.org 2025-10-31T04:07:08.000000Z
AI 使用让我们高估认知能力
oschina.net 2025-10-29T07:22:57.000000Z
ChessQA: Evaluating Large Language Models for Chess Understanding
cs.AI updates on arXiv.org 2025-10-29T04:24:43.000000Z
Introducing the Epoch Capabilities Index (ECI)
少点错误 2025-10-28T19:41:21.000000Z
AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite
cs.AI updates on arXiv.org 2025-10-27T06:19:46.000000Z
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants
cs.AI updates on arXiv.org 2025-10-24T04:54:20.000000Z
Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants
cs.AI updates on arXiv.org 2025-10-24T04:54:20.000000Z
五线小县城躺平还是一线大城市奋斗
虎扑-热帖 2025-10-23T07:18:41.000000Z
A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist
cs.AI updates on arXiv.org 2025-10-23T04:10:14.000000Z
Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning
cs.AI updates on arXiv.org 2025-10-22T04:17:01.000000Z
Humans Are Spiky (In an LLM World)
少点错误 2025-10-15T08:52:55.000000Z
Humans Are Spiky (In an LLM World)
少点错误 2025-10-15T08:52:55.000000Z
Inferring Capabilities from Task Performance with Bayesian Triangulation
cs.AI updates on arXiv.org 2025-10-09T04:14:34.000000Z
On Explaining Proxy Discrimination and Unfairness in Individual Decisions Made by AI Systems
cs.AI updates on arXiv.org 2025-10-01T05:58:44.000000Z
How Much Speculation About LLMs' Limits Is Optimal?
少点错误 2025-09-29T22:33:03.000000Z
【2025版】网络安全等级测评师能力评估报名资格要求
安小圈 2025-09-23T02:33:24.000000Z
网络安全等级测评师是时候与过去划句号了
安小圈 2025-09-22T02:33:58.000000Z
HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants
cs.AI updates on arXiv.org 2025-09-11T15:51:40.000000Z
CANDY: Benchmarking LLMs' Limitations and Assistive Potential in Chinese Misinformation Fact-Checking
cs.AI updates on arXiv.org 2025-09-05T04:45:53.000000Z