MarkTechPost@AI 10月13日 15:27
SwiReasoning:一种推理LLM的解码时框架
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

SwiReasoning是一种创新的解码时框架,它使推理型语言模型能够动态选择在潜在空间中思考还是生成显式链式思考(Chain-of-Thought, CoT)。该方法通过监测下一词元分布的熵趋势来估计块级置信度,从而决定何时进行“潜在推理”(不输出词元)和何时进行“显式推理”(输出CoT词元)。SwiReasoning具有训练无关、模型无关的特点,并在数学和STEM基准测试上实现了帕累托最优的准确率/效率权衡。实验结果表明,在无限词元预算下,准确率平均提升1.5%-2.8%,而在受限预算下,词元效率平均提升56%-79%。该框架能够帮助模型更快地收敛到最高推理准确率。

💡 **训练无关的推理策略控制:** SwiReasoning通过分析下一词元分布的熵趋势,动态地在潜在推理(在模型内部思考)和显式链式思考(生成可读的思考步骤)之间切换。这种方法不需要额外的训练,并且可以应用于各种语言模型,为推理过程引入了智能的“决策策略”。

🚀 **显著的效率提升:** 在词元预算受限的情况下,SwiReasoning能够带来高达56%-79%的平均词元效率提升。这意味着模型可以在消耗更少计算资源的情况下完成推理任务,这对于大规模部署和成本控制至关重要。

📈 **准确率的稳步提升:** 在无限词元预算下,SwiReasoning在数学和STEM推理任务上实现了1.5%-2.8%的平均准确率提升。此外,在AIME 2024/2025等具有挑战性的基准测试中,该框架能够比标准的CoT方法更早地达到最高推理准确率,显示了其更快的收敛速度。

⚖️ **平衡准确性与效率:** SwiReasoning的核心优势在于能够实现准确率和效率之间的帕累托最优权衡。通过智能地切换推理模式,它既能探索更广泛的解决方案(潜在推理),又能及时固定最优路径(显式推理),从而在保持高准确率的同时优化资源利用。

SwiReasoning is a decoding-time framework that lets a reasoning LLM decide when to think in latent space and when to write explicit chain-of-thought, using block-wise confidence estimated from entropy trends in next-token distributions. The method is training-free, model-agnostic, and targets Pareto-superior accuracy/efficiency trade-offs on mathematics and STEM benchmarks. Reported results show +1.5%–2.8% average accuracy improvements with unlimited tokens and +56%–79% average token-efficiency gains under constrained budgets; on AIME’24/’25, it reaches maximum reasoning accuracy earlier than standard CoT.

What SwiReasoning changes at inference time?

The controller monitors the decoder’s next-token entropy to form a block-wise confidence signal. When confidence is low (entropy trending upward), it enters latent reasoning—the model continues to reason without emitting tokens. When confidence recovers (entropy trending down), it switches back to explicit reasoning, emitting CoT tokens to consolidate and commit to a single path. A switch count control limits the maximum number of thinking-block transitions to suppress overthinking before finalizing the answer. This dynamic alternation is the core mechanism behind the reported accuracy-per-token gains.

https://arxiv.org/pdf/2510.05069

Results: accuracy and efficiency on standard suites

It reports improvements across mathematics and STEM reasoning tasks:

Why switching helps?

Explicit CoT is discrete and readable but locks in a single path prematurely, which can discard useful alternatives. Latent reasoning is continuous and information-dense per step, but purely latent strategies may diffuse probability mass and impede convergence. SwiReasoning adds a confidence-guided alternation: latent phases broaden exploration when the model is uncertain; explicit phases exploit rising confidence to solidify a solution and commit tokens only when beneficial. The switch count control regularizes the process by capping oscillations and limiting prolonged “silent” wandering—addressing both accuracy loss from diffusion and token waste from overthinking cited as challenges for training-free latent methods.

Positioning vs. baselines

The project compares against CoT with sampling, CoT greedy, and Soft Thinking, reporting a +2.17% average accuracy lift at unlimited budgets (Table 1) and consistent efficiency-per-token advantages under budget constraints. The visualized Pareto frontier shifts outward—either higher accuracy at the same budget or similar accuracy with fewer tokens—across different model families and scales. On AIME’24/’25, the Pass@k curves show that SwiReasoning reaches the performance ceiling with fewer samples than CoT, reflecting improved convergence behavior rather than only better raw ceilings.

https://arxiv.org/pdf/2510.05069
https://arxiv.org/pdf/2510.05069

Key Takeaways

Editorial Comments

SwiReasoning is a useful step toward pragmatic “reasoning policy” control at decode time: it’s training-free, slots behind the tokenizer, and exposes measurable gains on math/STEM suites by toggling between latent and explicit CoT using an entropy-trend confidence signal with a capped switch count. The open-source BSD implementation and clear flags (--max_switch_count, --alpha) make replication straightforward and lower the barrier to stacking with orthogonal efficiency layers (e.g., quantization, speculative decoding, KV-cache tricks). The method’s value proposition is “accuracy per token” rather than raw SOTA accuracy, which is operationally important for budgeted inference and batching.


Check out the Paper and Project Page. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post SwiReasoning: Entropy-Driven Alternation of Latent and Explicit Chain-of-Thought for Reasoning LLMs appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SwiReasoning LLM 推理 Chain-of-Thought 解码时优化 效率 准确率 SwiReasoning LLM Reasoning Chain-of-Thought Decoding-time Optimization Efficiency Accuracy
相关文章