Stability AI news 09月19日
Stable Virtual Camera:从2D图像生成沉浸式3D视频
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Stable Virtual Camera 是一款新发布的、处于研究预览阶段的多视角扩散模型,能够将2D图像转化为具有真实深度和视角的沉浸式3D视频。该模型无需复杂的重建或场景优化,能够从单张或最多32张输入图像生成3D视频,并支持用户自定义相机轨迹以及14种动态相机路径。Stable Virtual Camera 在研究领域可用,采用非商业许可。用户可以阅读论文、在Hugging Face下载模型权重,并在GitHub上获取代码。

✨ **创新3D视频生成技术**:Stable Virtual Camera 是一款创新的多视角扩散模型,能够将2D图像转化为具有逼真深度和视角的沉浸式3D视频。其核心优势在于无需进行复杂的3D重建或针对特定场景进行优化,大大简化了3D视频的生成流程。

🎛️ **灵活多样的相机控制**:该模型支持用户自定义相机运动轨迹,以实现个性化的视角切换。此外,它还内置了14种动态相机路径,包括360°全景、Lemniscate(无限形路径)、Spiral(螺旋)、Dolly Zoom(推拉摇移)、Move(移动)、Pan(摇摄)和Roll(滚转)等,为3D视频创作提供了极大的灵活性。

🖼️ **多输入与多输出能力**:Stable Virtual Camera 能够从单张图像或最多32张图像输入生成3D视频,为用户提供了广泛的输入选择。同时,它还能以方形(1:1)、竖屏(16:9)、横屏(16:9)及其他自定义比例生成视频,无需额外训练,满足不同平台和应用的需求。

📈 **卓越的性能与研究价值**:在新的视图合成(NVS)基准测试中,Stable Virtual Camera 取得了领先的性能,超越了ViewCrafter和CAT3D等模型。它在大型视角和小型视角NVS方面均表现出色,尤其在保证时间连贯性方面具有优势,为3D内容创作和虚拟现实领域的研究提供了有力工具。

Key Takeaways


Today, we're releasing Stable Virtual Camera, currently in research preview. This multi-view diffusion model transforms 2D images into immersive 3D videos with realistic depth and perspective—without complex reconstruction or scene-specific optimization. We invite the research community to explore its capabilities and contribute to its development.

A virtual camera is a digital tool used in filmmaking and 3D animation to capture and navigate digital scenes in real-time. Stable Virtual Camera builds upon this concept, combining the familiar control of traditional virtual cameras with the power of generative AI to offer precise, intuitive control over 3D video outputs.

Unlike traditional 3D video models that rely on large sets of input images or complex preprocessing, Stable Virtual Camera generates novel views of a scene from one or more input images at user specified camera angles. The model produces consistent and smooth 3D video outputs, delivering seamless trajectory videos across dynamic camera paths.

The model is available for research use under a Non-Commercial License. You can read the paper here, download the weights on Hugging Face, and access the code on GitHub.

Capabilities

Stable Virtual Camera offers advanced capabilities for generating 3D videos, including:

Research & model architecture

Stable Virtual Camera achieves state-of-the-art results in novel view synthesis (NVS) benchmarks, outperforming models like ViewCrafter and CAT3D. It excels in both large-viewpoint NVS, which emphasizes generation capacity, and small-viewpoint NVS, which prioritizes temporal smoothness.

These charts benchmark leading 3D video models across datasets, measuring perceptual quality (LPIPS) and accuracy (PSNR). Each axis reflects a different dataset and input setup.

Stable Virtual Camera is trained with a fixed sequence length as a multi-view diffusion model, taking a set number of input and target views (M-in, N-out).

Stable Virtual Camera is trained as a multi-view diffusion model with a fixed sequence length, using a set number of input and target views (M-in, N-out). During sampling, it functions as a flexible generative renderer, accommodating variable input and output lengths (P-in, Q-out). This is achieved through a two-pass procedural sampling process—first generating anchor views, then rendering target views in chunks to ensure smooth and consistent results.

Stable Virtual Camera uses procedural two-pass sampling to handle any number of input and target views.

For a deeper dive into the model’s architecture and performance, you can read the full research paper here.

Model limitations

In its initial version, Stable Virtual Camera may produce lower-quality results in certain scenarios. Input images featuring humans, animals, or dynamic textures like water often lead to degraded outputs. Additionally, highly ambiguous scenes, complex camera paths that intersect objects or surfaces, and irregularly shaped objects can cause flickering artifacts, especially when target viewpoints differ significantly from the input images.

Get started

Stable Virtual Camera is free to use for research purposes under a Non-Commercial License. You can read the paper and download the weights on Hugging Face and code on GitHub.

To stay updated on our progress, follow us on X, LinkedIn, Instagram, and join our Discord Community.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Stable Virtual Camera 3D video generation AI multi-view diffusion model virtual camera computer vision generative AI
相关文章