提升机器人操作能力：感知与规划的融合

Traditional task and motion planning (TAMP) systems for robot manipulation use cases operate on static models that often fail in new environments. Integrating perception with manipulation is a solution to this challenge, enabling robots to update plans mid-execution and adapt to dynamic scenarios.

In this edition of the NVIDIA Robotics Research and Development Digest (R²D²), we explore the use of perception-based TAMP and GPU-accelerated TAMP for long-horizon manipulation. We’ll also learn about a framework for improving robot manipulation skills. And we’ll show how vision and language can be used to translate pixels into subgoals, affordances, and differentiable constraints.

Subgoals are smaller intermediate objectives that guide the robot step-by-step toward the final goal. Affordances describe the actions that an object or environment allows a robot to perform, based on its properties and context. For instance, a handle affords “grasping,” a button affords “pressing,” and a cup affords “pouring.”Differentiable constraints in robot-motion planning ensure that the robot’s movements satisfy physical limits (like joint angles, collision avoidance, or end-effector positions) while still being adjustable via learning. Because they’re differentiable, GPUs can compute and refine them efficiently during training or real-time planning.

How task and motion planning transforms vision and language into robot action

TAMP involves deciding what a robot should do and how it should move to do it. Doing this requires combining high-level task-planning (what task to do) and low-level motion-planning (how to move to perform the task).

Modern robots can use both vision and language (like pictures and instructions) to break down complex tasks into smaller steps, called subgoals. These subgoals help the robot understand what needs to happen next, what objects to interact with, and how to move safely.

This process uses advanced models to turn images and written instructions into clear plans the robot can follow in real-world situations. Long-horizon manipulation requires structured intentions that can be satisfied by the planner. Let’s see how OWL-TAMP, VLM-TAMP, and NOD-TAMP help address this:

OWL-TAMP

vision-language models (VLMs)

VLM-TAMP

NOD-TAMP

How cuTAMP accelerates robot planning with GPU parallelization

Classical TAMP first analyzes the outline of actions for a task (called plan skeletons) and then proceeds to solve the continuous variables. This second step is usually the bottleneck in manipulation systems, which is vastly accelerated in cuTAMP. For a specified skeleton in cuTAMP, thousands of seeds (particles) are sampled, and then differentiable batch optimization is executed on the GPU to satisfy the various constraints (like inverse kinematics, collisions, stability, and goal costs).

If a skeleton is not feasible, the algorithm backtracks. If it is, the algorithm provides a plan, which can often happen in a matter of seconds for constrained packing/stacking tasks. This means that robots can find solutions for packing, stacking, or manipulating many objects in seconds instead of minutes or hours.

This “vectorized satisfaction” is the essence of making long-horizon problem-solving feasible in real-world applications.

Figure 2. cuTAMP frames TAMP as a backtracking bilevel search over plan skeletons.How robots learn from failures using Stein variational inferenceLong-horizon manipulation models can fail in novel conditions not seen during training. Fail2Progress is a framework for improving manipulation by enabling robots to learn from their own failures. This framework integrates failures into skill models through data-driven correction and simulation-based refinement. Fail2Progress uses Stein variational inference to generate targeted synthetic datasets similar to observed failures.These generated datasets can then be used to fine-tune and re-deploy a skill-effect model, enabling fewer repeats of the same failure on long-horizon tasks.Getting startedIn this blog, we talked about perception-based TAMP, GPU-accelerated TAMP, and a simulation-based refinement framework for robot manipulation. We saw common challenges in traditional TAMP and how these research efforts aim to solve them.Check out the following resources to learn more:This post is part of our NVIDIA Robotics Research and Development Digest (R²D²) to give developers deeper insight into the latest breakthroughs from NVIDIA Research across physical AI and robotics applications.Stay up to date by subscribing to the newsletter and following NVIDIA Robotics on YouTube, Discord, and developer forums. To start your robotics journey, enroll in free NVIDIA Robotics Fundamentals courses.AcknowledgmentsFor their contributions to the research mentioned in this post, thanks to Ankit Goyal, Caelan Garrett, Tucker Hermans, Yixuan Huang, Leslie Pack Kaelbling, Nishanth Kumar, Tomas Lozano-Perez, Ajay Mandlekar, Fabio Ramos, Shuo Cheng, Mohanraj Devendran Shanthi, William Shen, Danfei Xu, Zhutian Yang, Novella Alvina, Dieter Fox, and Xiaohan Zhang.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

机器人操作任务与运动规划感知 GPU加速人工智能机器学习 NVIDIA Robot Manipulation Task and Motion Planning Perception GPU Acceleration Artificial Intelligence Machine Learning NVIDIA

人工智能正在摧毁互联网内容生态系统

<i>The Residential Is Racial: A Perceptual History of Mass Homeownership</i> <br>- Adrienne Brown

阿里云：通义千问API日调用量破亿企业用户破9万

【iThome 2024 CIO大調查系列1】AI、資安和永續變革三箭齊發

鈺登跨入AI，推出搭配英特爾四代Xeon SP與Gaudi2的伺服器

蘋果發表M4晶片，更新iPad產品線

Red Hat推出AI平臺，內建IBM開源Granite模型

.footer { width: 100%; /* 原先页面已经预留了空间 */ /* height: 2.3rem; */ position: relative; } .footer.padding-bottom{ padding-bottom: 1.2rem; } .footer .fixed-footer { position: fixed; bottom: 0; left: 0; width: 100%; height: 2.3rem; background-color: #191919; z-index: 100; } .footer.padding-bottom .fixed-footer{ padding-bottom: 1.2rem; } .footer .fixed-footer .flex-content{ position: absolute; top: 0; left: 0; right: 0; bottom: 0; height: 2.3rem; display: flex; box-sizing: border-box; align-items: center; justify-content: space-between; padding:0 .55rem; } .footer .icon-left, .footer .icon-right{ position: absolute; width: .55rem; height: .55rem; top: -0.54rem; } .footer .icon-left{ left: 0; } .footer .icon-left::after{ position: absolute; width: .55rem; height: .55rem; content: ''; bottom: -0.01rem; left: -0.01rem; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAABC0lEQVQ4T63TMUrEUBDG8e+L5AYWeQFLC72A17Cw8gYqXkDBTryAoBaWixbWWlor1haW8pKZ4mElFj42I4GNhe6ym82bA/z4w8wQwAqAMRINy7Jcres6JPLAoig2VfU1JbijqnfJQOfcqYgcpwSfRGQrJTiOMa6FEOoUKJ1zBuBIRM5Sgu8isg7geyjaFcLM9lX1IhkIIJDcGHrkv4WTshsR2R1S+RdsrT0RuVwW/QeaWQSwrar3y6DTClvny8zal3zoi84C261HkocictUHnQl2CMnbLMsOvPcfi8BzwQkSzOxEVa/nHf+iYBfnSZ6THFVV5acV9wU7ozGzZ5KPAF6apnnL87z23n/+ADjcghv4tAnCAAAAAElFTkSuQmCC'); background-size: 100% 100%; background-repeat: no-repeat; } .footer .icon-right{ right: 0; } .footer .icon-right::after{ position: absolute; width: .55rem; height: .55rem; content: ''; bottom: -0.01rem; right: -0.01rem; background-image: url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAAUCAYAAACNiR0NAAABC0lEQVQ4T63TMUrEUBDG8e+L5AYWeQFLC72A17Cw8gYqXkDBTryAoBaWixbWWlor1haW8pKZ4mElFj42I4GNhe6ym82bA/z4w8wQwAqAMRINy7Jcres6JPLAoig2VfU1JbijqnfJQOfcqYgcpwSfRGQrJTiOMa6FEOoUKJ1zBuBIRM5Sgu8isg7geyjaFcLM9lX1IhkIIJDcGHrkv4WTshsR2R1S+RdsrT0RuVwW/QeaWQSwrar3y6DTClvny8zal3zoi84C261HkocictUHnQl2CMnbLMsOvPcfi8BzwQkSzOxEVa/nHf+iYBfnSZ6THFVV5acV9wU7ozGzZ5KPAF6apnnL87z23n/+ADjcghv4tAnCAAAAAElFTkSuQmCC'); background-size: 100% 100%; background-repeat: no-repeat; transform: rotateY(180deg); } .footer .flex-content .open-weapp { position: absolute; top: 0; left: 0; width: 100%; height: 100%; z-index: 10; opacity: 0; } .footer .flex-content .footer-left, .footer .flex-content .footer-right { position: relative; font-weight: bold; } .footer .flex-content .footer-left{ width: 4.35rem; height: 1.25rem; } .footer .flex-content .footer-left .footer-left-content { position: absolute; top: 0; left: 0; width: 100%; height: 100%; display: flex; align-items: center; color: #D4D4D4; font-size: .65rem; } .footer .flex-content .footer-left .footer-left-content .logo{ width: 1.1rem; height: 1.1rem; background-image: url('http://app.myzaker.com/news/images/logo_icon.png'); background-size: 100% 100%; background-repeat: no-repeat; margin-right: .35rem; border-radius: 50%; } .footer .flex-content .footer-right{ width: 4.38rem; height: 1.25rem; line-height: 1.25rem; display: block; box-sizing: border-box; } .footer .flex-content .footer-right .open-weapp-btn{ position: absolute; top: 0; left: 0; right: 0; bottom: 0; background-color: #2B2B2B; border-radius: .15rem; color: #D4D4D4; font-size: .65rem; text-align: center; display: block; } var browser = { versions: (function () { var u = navigator.userAgent.toLowerCase(), isPad = false,isAndroidPad = false,isIpad = false,isMobile = false,isPc = false; if(u.indexOf('')) if(u.indexOf('android') > -1){ if(u.indexOf('mobile') == -1){ isAndroidPad = true; } } if(u.indexOf('ipad') > -1){ isIpad = true; } if(isAndroidPad||isIpad){ isPad = true; }else if((u.indexOf('mobile') > -1 && !isPad ) || (u.indexOf('android') > -1 && !isAndroidPad) || (u.indexOf('phone') > -1)){ isMobile = true; }else{ isPc = true; } return { android: u.indexOf('android') > -1 || u.indexOf('Linux') > -1, iPhone: u.indexOf('iphone') > -1, isPad: isPad, isMobile:isMobile, isPc:isPc, wx:u.toLowerCase().indexOf('micromessenger') > -1, }; })() } var checkInZaker = function(){ if (navigator.appinfo || navigator.userAgent.match(/zaker/ig)) { return true; } return false; } if( location.href.indexOf('mobile=1')<0 && (browser.versions.isPc || browser.versions.isPad) ){ var style = '<style type="text/css">'; style+= 'html{background-color:#f8f8f8;}'; style+= '#body{width:720px;margin:0 auto;background-color:#fff;border-left:1px solid #e8e8e8;border-right:1px solid #e8e8e8;font-style:normal}'; style+= '#temple_title,#content_text,.icon-font-origin,#top5{padding:0 50px;}'; style+= '#downTips{width:720px;}'; style+= '#qrcode{position:fixed;background-color:#fff;margin:44px 0 0 740px;}'; style+= '#downTips{display:none;}'; style+= '</style>'; document.write(style); } var _$ = function(id){return document.querySelector(id);}, isWap = true; var qrcodeHtml = '' if(location.href.indexOf('mobile=1')<0 && browser.versions.isPc){ qrcodeHtml += '<img id="qrcode" src="/static/image/qrcode_dingyuehao.jpg"/>' } qrcodeHtml += '<div class="zk_top_barwrap"><div class="zk_top_bar"><a href="/" class="zk_top_bar_logo"></a></div></div>' $('#body').prepend(qrcodeHtml); var new_style = ''; var vo = document.createElement("a"); vo.className = 'icon-font-origin-btn'; vo.style.borderBottom = 'none'; vo.style.color = '#00abff'; vo.style.marginLeft = '0px'; if(new_style){ vo.style.cssText="border-bottom-style: none;font-size: 11px;color: #ababab;margin-left: 6px;"; document.getElementById('ID_disclaimer').style.cssText='text-align: left;color:#ababab;font-size: 16px;line-height: 32px;padding:0;padding-top: 4px;'; } vo.href = 'https://developer.nvidia.com/blog/r2d2-perception-guided-task-amp-motion-planning-for-long-horizon-manipulation/'; vo.innerHTML = '查看原文'; var el_disclaimer = _$("#ID_disclaimer"); if(el_disclaimer){ el_disclaimer.appendChild(vo); } //图片初始化 (function () { var imglazy = document.querySelectorAll('.img_box .lazy'); imglazy = Array.prototype.slice.call(imglazy); imglazy.forEach(function(img){ // 获取宽高 var dWidth = img.dataset['width']; var dHeight = img.dataset['height']; // 获取父元素 var parentEle = img; do{ parentEle = parentEle.parentNode; } while(!parentEle.classList.contains('img_box') || parentEle.id == "content") // 获取图片的父容器占宽 var parentWidth = parentEle.offsetWidth; // 1. 图片原宽度大于容器宽度70%，撑到100% // 2. 图片原宽度大于容器宽度40%，小于容器宽度70%，保持图片原尺寸 // 3. 图片原宽度小于容器宽度40%，撑到40% var maxRate = 0.7; var minRate = 0.4; // 计算阀值 var maxWidth = maxRate * parentWidth; var minWidth = minRate * parentWidth; // 最终设定图片的宽高 var imgWidth, imgHeight; if (dWidth) { if (dWidth > maxWidth) { imgWidth = parentWidth; } else if (dWidth > minWidth) { imgWidth = dWidth; img.parentNode.style['display'] = 'inline-block'; // img.parent('.content_img_div').css('display', 'inline-block'); } else { imgWidth = minWidth; img.parentNode.style['display'] = 'inline-block'; // img.parent('.content_img_div').css('display', 'inline-block'); } // 计算高度 imgHeight = dHeight / dWidth * imgWidth; } else { imgWidth = parentWidth; } // 设置图片大小 img.style['width'] = imgWidth + "px"; img.style['height'] = imgHeight + "px"; }); })(); var inzaker = (navigator.userAgent.match(/zaker/ig)) ? true : false; if(!inzaker && !navigator.userAgent.match(/AlipayClient/ig) ){ if(document.querySelector('.ntpl_head')){ (function(){ function getStyle(obj,attr){ if(obj.currentStyle){ return obj.currentStyle[attr]; }else{ return document.defaultView.getComputedStyle(obj,null)[attr]; } } var $ntplHead = document.querySelector('.ntpl_head'), pt = getStyle($ntplHead, 'paddingTop'); $ntplHead.style.paddingTop = (parseInt(pt, 10) - 20)+'px'; })(); } } window.zkgetWebConfig = function(data) { inzaker = true; if(data.appType == 'elderly'){ document.getElementsByTagName('body')[0].className += ' body_elderly'; } }; window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-LT4LDFPVLZ');