热点
关于我们
xx
xx
"
推理基准
" 相关文章
Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries
cs.AI updates on arXiv.org
2025-10-17T04:18:58.000000Z
Encode, Think, Decode: Scaling test-time reasoning with recursive latent thoughts
cs.AI updates on arXiv.org
2025-10-10T04:09:52.000000Z
Beyond Token Length: Step Pruner for Efficient and Accurate Reasoning in Large Language Models
cs.AI updates on arXiv.org
2025-10-07T04:15:43.000000Z
FinSearchComp: Towards a Realistic, Expert-Level Evaluation of Financial Search and Reasoning
cs.AI updates on arXiv.org
2025-09-17T05:24:49.000000Z
A Novel Architecture for Symbolic Reasoning with Decision Trees and LLM Agents
cs.AI updates on arXiv.org
2025-08-08T04:17:26.000000Z
识别高分低能,综合性视觉语言理解新基准,五项挑战评估多模态模型的推理能力
智源社区
2025-02-27T15:37:16.000000Z
OpenAI o1很强,也能被玩坏!
PaperAgent
2024-09-13T12:22:48.000000Z