热点
"竞赛评估" 相关文章
CodeClash: Benchmarking Goal-Oriented Software Engineering
cs.AI updates on arXiv.org 2025-11-05T05:27:39.000000Z
Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores
cs.AI updates on arXiv.org 2025-09-30T04:01:43.000000Z