Stability AI news 09月19日 19:57
Stability AI 与 NVIDIA 合作优化 SD3.5 模型,提升生成速度并降低显存需求
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Stability AI 与 NVIDIA 合作,推出了 NVIDIA TensorRT 优化的 Stable Diffusion 3.5 (SD3.5) 模型。此次优化显著提升了图像生成速度,SD3.5 Large 模型速度提升高达 2.3 倍,SD3.5 Medium 模型速度提升 1.7 倍,同时显存需求降低了 40%。这些优化后的模型现已在 Hugging Face 上提供下载,可用于商业和非商业用途。优化后的 SD3.5 模型在 RTX 50 系列等更多 NVIDIA GPU 上表现出色,进一步降低了企业级图像生成的门槛,为创意专业人士和开发者带来了更广泛的硬件支持和更高的创作效率。

🚀 **性能显著提升与显存优化**:通过与 NVIDIA 合作,Stable Diffusion 3.5 (SD3.5) 模型已成功集成 NVIDIA TensorRT 和 FP8 量化技术。这一优化使得 SD3.5 Large 模型在生成速度上实现了高达 2.3 倍的提升,SD3.5 Medium 模型也获得了 1.7 倍的速度增长。同时,模型对显存的需求降低了 40%,例如 SD3.5 Large 模型从 19GB 降至 11GB,这使得更多中端 RTX 硬件能够支持大型模型的运行,极大地扩展了用户群体。

🎨 **增强的通用性和易用性**:SD3.5 模型本身就以其出色的风格多样性、对不同人群的代表性以及对提示词的精准遵循而闻名。此次 TensorRT 优化进一步降低了运行门槛,让更多用户能够轻松利用其在 3D、摄影、绘画等多种风格生成方面的优势,以及在生成多样化、高质量图像方面的强大能力。无论用户是创意专业人士还是开发者,都能在更广泛的 NVIDIA RTX GPU 上获得流畅的创作体验。

⚖️ **开放的许可与便捷的获取**:优化后的 SD3.5 模型遵循 Stability AI 社区许可,允许商业和非商业用途,为各类项目提供了极大的灵活性。用户可以通过 Hugging Face 轻松下载模型权重,并在 NVIDIA 的 GitHub 仓库中获取相应的代码。这一举措旨在促进 AI 图像生成技术的普及和应用,鼓励社区的创新与发展。

Key Takeaways:

In collaboration with NVIDIA, we've optimized the SD3.5 family of models using TensorRT and FP8, improving generation speed and reducing VRAM requirements on supported RTX GPUs.

SD3.5 was developed to run on consumer hardware out of the box. The Nvidia optimizations extend that accessibility further for creative professionals and developers working across a variety of hardware setups.


Where the models excel

These performance improvements make SD3.5's core strengths more accessible. SD3.5 excels in the following areas, making it one of the most customizable image models on the market, while maintaining top-tier performance in prompt adherence and image quality:

Now available across more NVIDIA RTX GPUs

TensorRT optimization reduces model size while maintaining quality by streamlining how models run on NVIDIA hardware. Model size reduction is achieved through FP8 quantization, a technique that makes models more efficient while maintaining high output quality. These improvements mean that five RTX 50 Series systems can now run SD3.5 Large from memory, compared to just one system before optimization.

Enhanced performance across NVIDIA RTX GPUs

SD3.5 TensorRT-optimized models run more efficiently across NVIDIA GeForce RTX 50 and 40 Series GPUs, as well as NVIDIA Blackwell and Ada Lovelace generation NVIDIA RTX PRO GPUs. They deliver up to 2.3x faster generation on SD3.5 Large and 1.7x faster on SD3.5 Medium, while reducing VRAM requirements by 40%.

FP8 TensorRT boosts SD3.5 Large performance by 2.3x vs. BF16 PyTorch, with 40% less memory use. For SD3.5 Medium, BF16 TensorRT delivers a 1.7x speedup.

SD3.5 Large

SD3.5 Medium

Getting started

The optimized models are now available for commercial and non-commercial use under the permissive Stability AI Community License.You can download the weights on Hugging Face and code on NVIDIA’s GitHub.

To stay updated on our progress, follow us on X, LinkedIn, Instagram, and join our Discord Community.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Stable Diffusion 3.5 SD3.5 NVIDIA TensorRT FP8 AI图像生成 GPU优化 显存优化 Stability AI NVIDIA AI Art Image Generation Performance Optimization VRAM Reduction
相关文章