SV4D 2.0：动态3D资产生成多视图视频扩散模型

Stability AI Research 09月19日

SV4D 2.0：动态3D资产生成多视图视频扩散模型

本文提出了一种名为SV4D 2.0的多视图视频扩散模型，用于动态3D资产生成。该模型相较于前代SV4D，在遮挡、大运动场景下的鲁棒性、泛化能力及输出质量等方面均有显著提升。通过优化网络架构、增强数据质量、改进训练策略及4D优化方法，SV4D 2.0在视频合成和4D优化方面实现了显著的性能提升。

We present Stable Video 4D 2.0 (SV4D 2.0), a multi-view video diffusion model for dynamic 3D asset generation. Compared to its predecessor SV4D, SV4D 2.0 is more robust to occlusions and large motion, generalizes better to real-world videos, and produces higher-quality outputs in terms of detail sharpness and spatio-temporal consistency. We achieve this by introducing key improvements in multiple aspects: 1) network architecture: eliminating the dependency of reference multi-views and designing blending mechanism for 3D and frame attention, 2) data: enhancing quality and quantity of training data, 3) training strategy: adopting progressive 3D-4D training for better generalization, and 4) 4D optimization: handling 3D inconsistency and large motion via 2-stage refinement and progressive frame sampling. Extensive experiments demonstrate significant performance gain by SV4D 2.0 both visually and quantitatively, achieving better detail (-14\% LPIPS) and 4D consistency (-44\% FV4D) in novel-view video synthesis and 4D optimization (-12\% LPIPS and -24\% FV4D) compared to SV4D.

Read the paper

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SV4D 2.0 多视图视频扩散模型动态3D资产生成性能提升训练策略

相关文章

Show HN: 开源 LLM 补丁流 - 速度和输出令牌改进

Rivian 更新 R1，采用新型电机和电池组，提高了性能，降低了成本

Solana: ↩️ @vohvohh

Intel：正式发布第二代酷睿Ultra处理器架构

重要科學運算函式庫NumPy經多年開發迎來2.0重大更新

号称提升100倍的CPU设计，真相究竟是什么

苹果 iOS 18 助力 iPhone 15 Pro Max 机器学习测试得分提高 25%

Salesforce AI Unveils SFR-Embedding-v2: Reclaiming Top Spot on HuggingFace MTEB Benchmark with Advanced Multitasking and Enhanced Performance in AI

零下78℃全网首发！“骁龙8Gen2”极限超频49%！能干翻8Gen3？甚至比肩M1吗？【小鹏HiTech】

探秘华为 HDC2024！原生鸿蒙到底怎么样？？？