RDD：自动分解演示任务的创新方法

cs.AI updates on arXiv.org 10月17日 12:19

RDD：自动分解演示任务的创新方法

本文提出了一种基于检索的演示分解器（RDD），通过自动分解演示任务以适应低级视觉运动策略的训练数据，有效提高了任务性能。该方法在模拟和真实任务上均优于现有方法，展示了其广泛的适用性。

arXiv:2510.14968v1 Announce Type: cross Abstract: To tackle long-horizon tasks, recent hierarchical vision-language-action (VLAs) frameworks employ vision-language model (VLM)-based planners to decompose complex manipulation tasks into simpler sub-tasks that low-level visuomotor policies can easily handle. Typically, the VLM planner is finetuned to learn to decompose a target task. This finetuning requires target task demonstrations segmented into sub-tasks by either human annotation or heuristic rules. However, the heuristic subtasks can deviate significantly from the training data of the visuomotor policy, which degrades task performance. To address these issues, we propose a Retrieval-based Demonstration Decomposer (RDD) that automatically decomposes demonstrations into sub-tasks by aligning the visual features of the decomposed sub-task intervals with those from the training data of the low-level visuomotor policies. Our method outperforms the state-of-the-art sub-task decomposer on both simulation and real-world tasks, demonstrating robustness across diverse settings. Code and more results are available at rdd-neurips.github.io.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

RDD 任务分解视觉语言行动自动分解

相关文章

Fairness and Robustness in Federated Learning with Virginia Smith -#504

High-Dimensional Robust Statistics with Ilias Diakonikolas - #351

RABBITS: A Specialized Dataset and Leaderboard to Aid in Evaluating LLM Performance in Healthcare

This AI Paper from UC Berkeley Research Highlights How Task Decomposition Breaks the Safety of Artificial Intelligence (AI) Systems, Leading to Misuse

Comprehensive Analysis of The Performance of Vision State Space Models (VSSMs), Vision Transformers, and Convolutional Neural Networks (CNNs)

多模态大模型看懂图片也会答错，智源联合多家机构推出多模态模型鲁棒性测试基准

北航沙磊教授：当Agentic RAG照进现实——Agent Insights

Generalizable Reward Model (GRM): An Efficient AI Approach to Improve the Generalizability and Robustness of Reward Learning for LLMs

LayerShuffle: Robust Vision Transformers for Arbitrary Layer Execution Orders

击败人类又怎样？“超人”AI简直不堪一击？研究发现：ChatGPT等大模型也不行