动态嵌套深度提升LLM性能

cs.AI updates on arXiv.org 10月14日

动态嵌套深度提升LLM性能

本文提出动态嵌套深度（DND）方法，通过选择关键标记在嵌套深度方式下重新处理，提升现有LLM性能。采用路由器和阈值控制策略，在密集和MoE模型中实现性能提升，参数和计算量增加最小。

arXiv:2510.11001v1 Announce Type: cross Abstract: We introduce Dynamic Nested Depth (DND), a novel method that improves performance for off-the-shelf LLMs by selecting critical tokens to reprocess in a nested depth manner. Specifically, at the end of the given transformer layer, DND identifies more critical tokens with a router and feeds them back for an extra round of processing, effectively ``reviewing" difficult tokens while avoiding redundant computation for easier ones. The dynamic selection mechanism is tailored for precise control via two novel strategies: a router controlling loss to enhance token selection distinguishability, and a threshold control scheme to ensure selection stability. We demonstrate the effectiveness of DND by directly integrating it into pre-trained dense and MoE models during a post-training phase. On diverse benchmarks, this approach boosts the performances of the dense Qwen3-1.7B by 1.88% and the MoE Qwen3-30B-A3B by 0.87%, all with a minimal parameter and computing increase.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

动态嵌套深度 LLM性能路由器策略阈值控制

相关文章

CoT神话破灭，并非LLM标配！三大学府机构联手证实，CoT仅在数学符号推理有用

MAGICORE: An AI Framework for Multi Agent Iteration for Coarse-to-fine Refinement

Archon: A Machine Learning Framework for Large Language Model Enhancement Using Automated Inference-Time Architecture Search for Improved Task Performance

Where do LLMs spend their FLOPS?

KAG开源了，知识增强掀翻RAG，性能翻倍

Princeton University Researchers Introduce Self-MoA and Self-MoA-Seq: Optimizing LLM Performance with Single-Model Ensembles

Is It Time To Treat Prompts As Code? A Multi-Use Case Study For Prompt Optimization Using DSPy

LLM-Crowdsourced: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models

GneissWeb: Preparing High Quality Data for LLMs at Scale

MArgE: Meshing Argumentative Evidence from Multiple Large Language Models for Justifiable Claim Verification