Stable Diffusion模型家族迎来AMD优化，性能显著提升

Stability AI news 09月19日

../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

Stability AI与AMD合作，为Stable Diffusion模型家族推出了经过ONNX优化的版本，旨在提升在AMD Radeon™ GPU和Ryzen™ AI APU上的运行速度和效率。此次合作专注于在不牺牲模型输出质量的前提下最大化推理性能。优化的模型包括Stable Diffusion 3.5 Large、Stable Diffusion 3.5 Large Turbo、Stable Diffusion XL 1.0和Stable Diffusion XL Turbo，它们已在Hugging Face上发布，后缀为“_amdgpu”。用户可通过Amuse 3.0体验这些优化模型。SD3.5模型推理速度最高可达基准PyTorch模型的2.6倍，SDXL模型则可实现最高3.8倍的加速，为构建更快速、更高效的创意应用提供了有力支持。

🚀 **性能显著提升**：通过与AMD的深度合作，Stable Diffusion模型家族的ONNX优化版本在AMD Radeon™ GPU和Ryzen™ AI APU上的运行速度和效率得到了大幅提升。具体而言，SD3.5模型系列推理速度最高可达基准PyTorch模型的2.6倍，而SDXL模型系列更是实现了最高3.8倍的推理速度提升。

💻 **广泛兼容与易于集成**：这些经过AMD优化的模型无缝集成到任何支持ONNX Runtime的环境中，用户可以轻松地将其部署到现有的工作流程中，无需复杂的配置。这使得开发者和企业能够更便捷地在AMD硬件上利用Stable Diffusion技术，加速创意应用的开发和部署。

🌟 **兼顾性能与质量**：此次优化工程的重点在于在提升推理性能的同时，确保模型输出的质量不受影响。这意味着用户可以在享受更快的生成速度的同时，依然获得高质量的图像输出，满足专业应用的需求。

💡 **多款模型优化**：此次优化的模型覆盖了Stable Diffusion 3.5和SDXL两大系列，包括Stable Diffusion 3.5 Large、Stable Diffusion 3.5 Large Turbo、Stable Diffusion XL 1.0以及Stable Diffusion XL Turbo。这些模型均以“_amdgpu”后缀在Hugging Face上提供，方便用户查找和下载。

Key Takeaways

We’ve collaborated with AMD to deliver select ONNX-optimized versions of the Stable Diffusion family of models, engineered to run faster and more efficiently on AMD Radeon™ GPUs and Ryzen™ AI APUs.

AMD-optimized versions of Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large Turbo, Stable Diffusion XL 1.0, and Stable Diffusion XL Turbo are now available on Hugging Face and suffixed with “_amdgpu”. End users can try out the AMD optimized models using Amuse 3.0.

You can learn more about the technical details of these speed upgrades on AMD’s blog post.

We’ve collaborated with AMD to deliver select ONNX-optimized versions of the Stable Diffusion model family, engineered to run faster and more efficiently on AMD Radeon™ GPUs and Ryzen™ AI APUs. This joint engineering effort focused on maximizing inference performance without compromising model output quality or our open licensing.

The result is a set of accelerated models that integrate into any ONNX Runtime-supported environment, making it easy to drop them into your existing workflows right out of the box. Whether you’re deploying Stable Diffusion 3.5 (SD3.5) variants, our most advanced image model, or Stable Diffusion XL Turbo (SDXL Turbo), these models are ready to power faster creative applications on AMD hardware.

As generative visual media adoption accelerates, it’s essential our models are optimized for leading hardware. This collaboration ensures builders and businesses can integrate Stable Diffusion into their production pipelines, making workflows faster, more efficient, and ready to scale.

Available models

AMD has optimized four models across SD3.5 and SDXL for improved performance.

SD3.5 Version:

Stable Diffusion 3.5 Large

Stable Diffusion 3.5 Large Turbo

AMD-optimized SD3.5 models deliver up to 2.6x faster inference when compared to the base PyTorch models.

SDXL Version:

S table Diffusion XL 1.0

Stable Diffusion XL Turbo

With AMD optimization, SDXL 1.0 and SDXL Turbo achieve up to 3.8x faster inference, when compared to the base PyTorch models.

Analysis compares AMD-optimized model inference speed to the base PyTorch models. Testing was conducted using Amuse 3.0 RC and AMD Adrenalin 24.30.31.05 KB driver - 25.4.1 preview.

Get started

The AMD-optimized Stable Diffusion models are available now on Hugging Face and suffixed with “_amdgpu”. End users can also try out the AMD optimized models using Amuse 3.0.You can learn more about the technical details of these speed upgrades on AMD’s blog post.

To stay updated on our progress, follow us on X,LinkedIn, Instagram, and join our Discord Community.

Key Takeaways

Available models

Get started

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签