cs.AI updates on arXiv.org 10月17日 12:18
CodeSeq:提升LLM归纳推理能力
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出CodeSeq,一种基于数字序列的合成后训练数据集,用于提升大型语言模型在归纳推理任务上的表现。通过将数字序列打包成算法问题,发现其通用术语,并定义了通用术语生成(GTG)任务。通过反思失败测试案例和迭代修正,生成监督微调数据,使模型学会自主案例生成和自我检查。此外,利用基于可解性和自我引导案例生成成功率的Case-Synergy Solvability Scaling Reward进行强化学习,使模型能更有效地从成功和失败中学习。

arXiv:2510.14620v1 Announce Type: cross Abstract: Large language models (LLMs) make remarkable progress in reasoning tasks. Among different reasoning modes, inductive reasoning, due to its better alignment with human learning, attracts increasing interest. However, research on inductive reasoning faces certain challenges. First, existing inductive data mostly focuses on superficial regularities while lacking more complex internal patterns. Second, current works merely prompt LLMs or finetune on simple prompt-response pairs, but do not provide precise thinking processes nor implement difficulty control. Unlike previous work, we address these challenges by introducing \textit{CodeSeq}, a synthetic post-training dataset built from number sequences. We package number sequences into algorithmic problems to discover their general terms, defining a general term generation (GTG) task correspondingly. Our pipeline generates supervised finetuning data by reflecting on failed test cases and incorporating iterative corrections, thereby teaching LLMs to learn autonomous case generation and self-checking. Additionally, it leverages reinforcement learning with a novel Case-Synergy Solvability Scaling Reward based on both solvability, estimated from the problem pass rate, and the success rate of self-directed case generation, enabling models to learn more effectively from both successes and failures. Experimental results show that the models trained with \textit{CodeSeq} improve on various reasoning tasks and can preserve the models' OOD performance.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM 归纳推理 CodeSeq 强化学习 算法问题
相关文章