能力评估_Fishai

热点

"能力评估" 相关文章

[职场话题] 是我要求太严格了还是现在测试就这水平

V2EX 2025-11-06T05:14:14.000000Z

Can Agent Conquer Web? Exploring the Frontiers of ChatGPT Atlas Agent in Web Games

cs.AI updates on arXiv.org 2025-10-31T04:07:08.000000Z

AI 使用让我们高估认知能力

oschina.net 2025-10-29T07:22:57.000000Z

ChessQA: Evaluating Large Language Models for Chess Understanding

cs.AI updates on arXiv.org 2025-10-29T04:24:43.000000Z

Introducing the Epoch Capabilities Index (ECI)

少点错误 2025-10-28T19:41:21.000000Z

AstaBench: Rigorous Benchmarking of AI Agents with a Scientific Research Suite

cs.AI updates on arXiv.org 2025-10-27T06:19:46.000000Z

Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants

cs.AI updates on arXiv.org 2025-10-24T04:54:20.000000Z

Face-Human-Bench: A Comprehensive Benchmark of Face and Human Understanding for Multi-modal Assistants

cs.AI updates on arXiv.org 2025-10-24T04:54:20.000000Z

五线小县城躺平还是一线大城市奋斗

虎扑-热帖 2025-10-23T07:18:41.000000Z

A Multi-faceted Analysis of Cognitive Abilities: Evaluating Prompt Methods with Large Language Models on the CONSORT Checklist

cs.AI updates on arXiv.org 2025-10-23T04:10:14.000000Z

Are LLMs Court-Ready? Evaluating Frontier Models on Indian Legal Reasoning

cs.AI updates on arXiv.org 2025-10-22T04:17:01.000000Z

Humans Are Spiky (In an LLM World)

少点错误 2025-10-15T08:52:55.000000Z

Humans Are Spiky (In an LLM World)

少点错误 2025-10-15T08:52:55.000000Z

Inferring Capabilities from Task Performance with Bayesian Triangulation

cs.AI updates on arXiv.org 2025-10-09T04:14:34.000000Z

On Explaining Proxy Discrimination and Unfairness in Individual Decisions Made by AI Systems

cs.AI updates on arXiv.org 2025-10-01T05:58:44.000000Z

How Much Speculation About LLMs' Limits Is Optimal?

少点错误 2025-09-29T22:33:03.000000Z

【2025版】网络安全等级测评师能力评估报名资格要求

安小圈 2025-09-23T02:33:24.000000Z

网络安全等级测评师是时候与过去划句号了

安小圈 2025-09-22T02:33:58.000000Z

HumanAgencyBench: Scalable Evaluation of Human Agency Support in AI Assistants

cs.AI updates on arXiv.org 2025-09-11T15:51:40.000000Z

CANDY: Benchmarking LLMs' Limitations and Assistive Potential in Chinese Misinformation Fact-Checking

cs.AI updates on arXiv.org 2025-09-05T04:45:53.000000Z

Copyright © 2019 FISHAI.All Rights Reserved