热点
"项目反应理论" 相关文章
TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models
cs.AI updates on arXiv.org 2025-10-27T06:30:50.000000Z
Learning Compact Representations of LLM Abilities via Item Response Theory
cs.AI updates on arXiv.org 2025-10-02T04:15:00.000000Z
Beyond Overall Accuracy: A Psychometric Deep Dive into the Topic-Specific Medical Capabilities of 80 Large Language Models
cs.AI updates on arXiv.org 2025-09-30T04:06:17.000000Z
Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
少点错误 2025-02-10T19:51:57.000000Z
[LDSL#6] When is quantification needed, and when is it hard?
少点错误 2024-08-13T20:51:56.000000Z