热点
"SWE-Bench" 相关文章
JoyCode:SWE-bench Verified打榜技术报告
掘金 人工智能 2025-11-04T00:11:21.000000Z
开源即登榜!登顶全球前十AI编程智能体,UCL初创团队开源Prometheus
新智元 2025-10-27T14:03:41.000000Z
开源即登榜!登顶全球前十AI编程智能体,UCL初创团队开源Prometheus
新智元 2025-10-27T14:03:41.000000Z
Software Engineering Agent via Self-Abstraction from Grounded Experience
Salesforce Blog AI Research 2025-10-21T18:01:46.000000Z
Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
cs.AI updates on arXiv.org 2025-10-20T04:15:05.000000Z
Training Software Engineering Agents and Verifiers with SWE-Gym
machinelearning apple 2025-10-16T19:18:17.000000Z
快手 Kwaipilot 团队开源 KAT-Dev-72B-Exp
oschina.net 2025-10-11T02:21:49.000000Z
研究人员在离线学习加入多样性激励,减轻创意写作“AI味”
DeepTech深科技 2025-10-09T04:31:22.000000Z
Claude Sonnet 4.5 analysis
Braintrust Blog 2025-10-02T12:51:23.000000Z
What are popular AI coding benchmarks actually measuring?
Nilenso Blog 2025-09-30T11:09:04.000000Z
Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents
cs.AI updates on arXiv.org 2025-09-30T04:01:01.000000Z
Agentic Coding表现创新高,全新KAT系列模型上榜SWE-Bench
机器之心 2025-09-26T14:59:25.000000Z
An Empirical Study on Failures in Automated Issue Solving
cs.AI updates on arXiv.org 2025-09-18T04:47:00.000000Z
GPT-5-Codex:AI编程,告别F5的七小时史诗?
掘金 人工智能 2025-09-16T14:00:39.000000Z
How GPT5 + Codex took over Agentic Coding — ft. Greg Brockman, OpenAI
Latent 2025-09-16T00:53:03.000000Z
深度揭秘OpenAI如何让GPT-5「技术性」超越Claude:悄悄跳过最难的23道题
智源社区 2025-08-21T04:43:32.000000Z
深度揭秘OpenAI如何让GPT-5「技术性」超越Claude:悄悄跳过最难的23道题
36氪 - 科技频道 2025-08-20T02:09:54.000000Z
GPT-5费尽心机“作弊”,只为超过心魔Claude
36氪 - 科技频道 2025-08-18T03:49:22.000000Z
GPT-5编程成绩有猫腻!自删23道测试题,关键基准还是自己提的
智源社区 2025-08-13T07:34:40.000000Z
GPT-5编程成绩有猫腻 自删23道测试题 关键基准还是自己提的
Cnbeta 2025-08-12T07:21:38.000000Z