热点
"LLM能力评估" 相关文章
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction
cs.AI updates on arXiv.org 2025-10-21T04:10:23.000000Z
JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory
cs.AI updates on arXiv.org 2025-09-30T04:00:40.000000Z
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
cs.AI updates on arXiv.org 2025-07-22T04:34:20.000000Z