热点
关于我们
xx
xx
"
推理评估
" 相关文章
Assessing LLM Reasoning Steps via Principal Knowledge Grounding
cs.AI updates on arXiv.org
2025-11-05T05:28:07.000000Z
What Defines Good Reasoning in LLMs? Dissecting Reasoning Steps with Multi-Aspect Evaluation
cs.AI updates on arXiv.org
2025-10-24T04:18:43.000000Z
Measuring Reasoning in LLMs: a New Dialectical Angle
cs.AI updates on arXiv.org
2025-10-22T04:12:06.000000Z
Measuring Reasoning in LLMs: a New Dialectical Angle
cs.AI updates on arXiv.org
2025-10-22T04:12:06.000000Z
Foundational Automatic Evaluators: Scaling Multi-Task Generative Evaluator Training for Reasoning-Centric Domains
cs.AI updates on arXiv.org
2025-10-21T04:29:00.000000Z
ConfProBench: A Confidence Evaluation Benchmark for MLLM-Based Process Judges
cs.AI updates on arXiv.org
2025-08-07T04:12:30.000000Z
大模型为何难成为「数学家」?斯坦福等揭示严谨证明中的结构性弱点
机器之心
2025-06-22T22:50:49.000000Z