探究多视角Transformer的3D表征

cs.AI updates on arXiv.org 10月30日 12:16

探究多视角Transformer的3D表征

本文介绍了一种探究多视角Transformer内部机制的方法，通过分析DUSt3R模型，揭示了其潜在状态的发展、各层的作用以及与具有显式全局姿态偏置方法的不同之处。

arXiv:2510.24907v1 Announce Type: cross Abstract: Multi-view transformers such as DUSt3R are revolutionizing 3D vision by solving 3D tasks in a feed-forward manner. However, contrary to previous optimization-based pipelines, the inner mechanisms of multi-view transformers are unclear. Their black-box nature makes further improvements beyond data scaling challenging and complicates usage in safety- and reliability-critical applications. Here, we present an approach for probing and visualizing 3D representations from the residual connections of the multi-view transformers' layers. In this manner, we investigate a variant of the DUSt3R model, shedding light on the development of its latent state across blocks, the role of the individual layers, and suggest how it differs from methods with stronger inductive biases of explicit global pose. Finally, we show that the investigated variant of DUSt3R estimates correspondences that are refined with reconstructed geometry. The code used for the analysis is available at https://github.com/JulienGaubil/und3rstand .

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多视角Transformer 3D视觉 DUSt3R模型内部机制表征分析

相关文章

Trends in Computer Vision with Amir Zamir - #338

奥比中光：公司3D视觉技术已与人形机器人客户进行适配

CVPR 2024 录用数据出炉！这些方向是大趋势！

格灵深瞳：智驾独角兽，驭势科技专注无人商用车落地

奥比中光：已与英伟达在机器人场景进行深度合作

奥比中光：已与部分人形机器人客户进行适配和测试

【奥比中光：已与部分人形机器人客户进行适配和测试】8月23日讯，奥比中光在互动平台表示，面向包括人形机器人在内的各类型机器人，公司可提供单目/双目结构光、...

Apple AI Releases Depth Pro: A Foundation Model for Zero-Shot Metric Monocular Depth Estimation

AI视频生成能操控运镜了，轻松推拉升降，秒拍电影大片，Runway出品

Unraveling Multimodal Dynamics: Insights into Cross-Modal Information Flow in Large Language Models