MarkTechPost@AI 03月10日
This AI Paper Introduces CODI: A Self-Distillation Framework for Efficient and Scalable Chain-of-Thought Reasoning in LLMs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

CODI是一种新颖的框架,旨在提高大型语言模型(LLM)的推理效率和可扩展性。它通过自蒸馏将显式思维链(CoT)推理提炼到连续空间中,使LLM能够在内部执行逻辑推导,而无需生成显式的CoT tokens。CODI采用了一种单步蒸馏方法,避免了课程学习中固有的信息损失和遗忘问题。实验结果表明,CODI在数学推理任务中显著优于以往的隐式CoT方法,并且首次达到了与显式CoT相当的精度,同时实现了更高的压缩率和处理速度。

💡CODI通过自蒸馏将显式CoT推理提炼到连续空间,使LLM在内部进行逻辑推导,无需显式CoT tokens,从而提高推理效率。

🧪CODI采用单步蒸馏方法,通过对齐教师模型(显式CoT生成)和学生模型(连续CoT推理)的隐藏激活,将推理监督直接注入到模型的隐藏状态中,有效避免了课程学习中的信息损失和遗忘问题。

📊实验表明,CODI在GSM8k数据集上实现了3.1倍的压缩率,同时保持了与显式CoT相当的性能,并且超越Coconut 28.2%的准确率。在效率方面,CODI处理推理步骤比传统CoT快2.7倍。

Chain-of-Thought (CoT) prompting enables large language models (LLMs) to perform step-by-step logical deductions in natural language. While this method has proven effective, natural language may not be the most efficient medium for reasoning. Studies indicate that human mathematical reasoning does not primarily rely on language processing, suggesting that alternative approaches could enhance performance. Researchers aim to refine how LLMs process reasoning, balancing accuracy with computational efficiency.

The challenge of reasoning in LLMs stems from their reliance on explicit CoT, which requires generating detailed explanations before arriving at a final answer. This approach increases computational overhead and slows down inference. Implicit CoT methods attempt to internalize reasoning without generating explicit reasoning tokens, but these methods have historically underperformed compared to explicit CoT. A major obstacle lies in designing models that can efficiently process reasoning internally while maintaining accuracy. A solution that eliminates excessive computational burden without sacrificing performance is critical for scaling up reasoning capabilities in LLMs.

Previous implicit CoT methods have primarily relied on curriculum learning strategies, which progressively internalize reasoning steps. One such method, Coconut, gradually replaces explicit CoT tokens with continuous representations while maintaining a language modeling objective. However, this approach has limitations, including error propagation and gradual forgetting during training. As a result, Coconut, despite improvements over baseline models, still lags behind explicit CoT methods by a significant margin. Implicit CoT approaches have consistently failed to match the reasoning performance of explicitly generated CoT.

Researchers from King’s College London and The Alan Turing Institute introduced CODI (Continuous Chain-of-Thought via Self-Distillation) as a novel framework to address these limitations. CODI distills explicit CoT reasoning into a continuous space, allowing LLMs to perform logical deductions internally without generating explicit CoT tokens. The method employs self-distillation, where a single model functions as both a teacher and a student, aligning their hidden activations to encode reasoning within a compact latent space. By leveraging this technique, CODI effectively compresses reasoning without sacrificing performance.

CODI consists of two key learning tasks: explicit CoT generation and continuous CoT reasoning. The teacher model follows standard CoT learning by processing natural language step-by-step reasoning and generating explicit CoT sequences. The student model, in contrast, learns to internalize reasoning within a compact latent representation. To ensure proper knowledge transfer, CODI enforces alignment between these two processes using an L1 distance loss function. Unlike previous approaches, CODI directly injects reasoning supervision into the hidden states of the model, allowing for more efficient training. Instead of relying on multiple training stages, CODI applies a single-step distillation approach, ensuring that information loss and forgetting issues inherent in curriculum learning are minimized. The process involves selecting a specific hidden token that encodes crucial reasoning information, providing the model can effectively generate continuous reasoning steps without explicit tokens.

Experimental results demonstrate that CODI significantly outperforms previous implicit CoT methods and is the first to match the accuracy of explicit CoT in mathematical reasoning tasks. On the GSM8k dataset, CODI achieves a 3.1× compression ratio while maintaining performance comparable to explicit CoT. It surpasses Coconut by 28.2% in accuracy. Further, CODI is scalable and adaptable to various CoT datasets, making it suitable for more complex reasoning problems. Performance benchmarks indicate that CODI achieves a reasoning accuracy of 43.7% on GSM8k with a GPT-2 model, compared to 34.1% with Coconut. When tested on larger models such as LLaMA3.2-1b, CODI attains 55.6% accuracy, demonstrating its ability to scale effectively. Regarding efficiency, CODI processes reasoning steps 2.7 times faster than traditional CoT and 5.9 times faster when applied to more verbose reasoning datasets. Its robust design allows it to generalize to out-of-domain benchmarks, outperforming CoT-SFT on datasets such as SVAMP and MultiArith.

CODI marks a significant improvement in LLM reasoning, effectively bridging the gap between explicit CoT and computational efficiency. Leveraging self-distillation and continuous representations introduces a scalable approach to AI reasoning. The model retains interpretability, as its continuous thoughts can be decoded into structured reasoning patterns, providing transparency in the decision-making process. Future research could explore CODI’s application in more complex multimodal reasoning tasks, expanding its benefits beyond mathematical problem-solving. The framework establishes implicit CoT as a computationally efficient alternative and a viable solution for reasoning challenges in advanced AI systems.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 80k+ ML SubReddit.

Meet Parlant: An LLM-first conversational AI framework designed to provide developers with the control and precision they need over their AI customer service agents, utilizing behavioral guidelines and runtime supervision. It’s operated using an easy-to-use CLI and native client SDKs in Python and TypeScript .

The post This AI Paper Introduces CODI: A Self-Distillation Framework for Efficient and Scalable Chain-of-Thought Reasoning in LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

CODI 自蒸馏 LLM推理
相关文章