cs.AI updates on arXiv.org 09月23日
Reasoning Core:强化学习可验证奖励环境
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种名为Reasoning Core的新环境,旨在通过可验证奖励的强化学习(RLVR)来推进大型语言模型(LLMs)中的基础符号推理。该环境通过程序生成跨多个核心形式领域的问题,包括PDDL规划、一阶逻辑、上下文无关语法解析、因果推理和系统方程求解,为LLMs的推理能力提升提供了一种有潜力的资源。

arXiv:2509.18083v1 Announce Type: new Abstract: We introduce Reasoning Core, a new scalable environment for Reinforcement Learning with Verifiable Rewards (RLVR), designed to advance foundational symbolic reasoning in Large Language Models (LLMs). Unlike existing benchmarks that focus on games or isolated puzzles, Reasoning Core procedurally generates problems across core formal domains, including PDDL planning, first-order logic, context-free grammar parsing, causal reasoning, and system equation solving. The environment is built on key design principles of high-generality problem distributions, verification via external tools, and continuous difficulty control, which together provide a virtually infinite supply of novel training instances. Initial zero-shot evaluations with frontier LLMs confirm the difficulty of Reasoning Core's tasks, positioning it as a promising resource to improve the reasoning capabilities of future models.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

强化学习 可验证奖励 大型语言模型 符号推理 Reasoning Core
相关文章