热点
关于我们
xx
xx
"
LLM能力评估
" 相关文章
BuildArena: A Physics-Aligned Interactive Benchmark of LLMs for Engineering Construction
cs.AI updates on arXiv.org
2025-10-21T04:10:23.000000Z
JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory
cs.AI updates on arXiv.org
2025-09-30T04:00:40.000000Z
Metric assessment protocol in the context of answer fluctuation on MCQ tasks
cs.AI updates on arXiv.org
2025-07-22T04:34:20.000000Z