Microsoft Azure Blog Announcements 5小时前
Azure携手Anyscale,助力分布式AI/ML大规模应用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

文章介绍了微软与Anyscale的合作,将Anyscale托管的Ray服务引入Azure,作为一等公民提供私有预览。此次合作旨在简化AI/ML工作负载的扩展过程,让开发者能够像编写Python代码一样自然地进行分布式计算。Ray是一个开源分布式计算框架,能够帮助团队从笔记本电脑上的实验扩展到生产级工作负载,而无需重写代码。Anyscale在Azure上提供的托管服务,结合RayTurbo高性能运行时,进一步提升了集群效率和Python工作负载的性能,为企业级AI应用提供了强大的支持。Azure Kubernetes Service (AKS)为该服务提供了底层基础设施支持,确保了可扩展性、弹性和治理能力。

🚀 **Ray:Python的开源分布式计算框架** Ray旨在简化Python生态系统的分布式系统开发,允许开发者以最小的改动将代码从单台笔记本扩展到大型集群。它通过Pythonic API将函数和类转化为分布式任务和Actor,无需修改核心逻辑,并能智能调度跨CPU、GPU和异构环境的工作负载,优化资源利用。

💡 **Anyscale on Azure:企业级Ray服务** Anyscale在Azure上提供的托管服务,进一步提升了Ray的易用性和企业级能力。通过RayTurbo高性能运行时,用户可以快速启动Ray集群(无需Kubernetes专业知识),动态分配任务,并利用弹性扩展、GPU打包和Azure Spot VMs实现快速、经济高效的大规模实验。该服务还提供自动故障恢复、零停机升级和集成可观测性,确保生产环境的可靠性。

🌐 **Azure Kubernetes Service (AKS) 提供底层支持** Azure Kubernetes Service (AKS) 为此次托管服务提供了强大的基础设施基础。AKS负责管理分布式工作负载的复杂性,提供动态资源编排、高可用性、弹性扩展以及与Azure Monitor、Microsoft Entra ID等Azure服务的原生集成,为企业AI应用提供可扩展、高弹性和易于治理的环境。

🤝 **合作愿景:加速AI/ML创新** 微软与Anyscale的合作,结合了开源Ray、托管云基础设施和Kubernetes编排的优势。Azure客户可以根据自身需求选择使用开源Ray、Anyscale的托管体验,或将其与Azure原生服务结合,从而更快速地从原型开发过渡到生产部署,减少运营开销,并在云规模上进行扩展,专注于AI/ML的突破性创新。

The path from prototype to production for AI/ML workloads is rarely straightforward. As data pipelines expand and model complexity grows, teams can find themselves spending more time orchestrating distributed compute than building the intelligence that powers their products. Scaling from a laptop experiment to a production-grade workload still feels like reinventing the wheel. What if scaling AI workloads felt as natural as writing in Python itself? That’s the idea behind Ray, the open-source distributed computing framework born at UC Berkeley’s RISELab, and now, it’s coming to Azure in a whole new way.

Today, at Ray Summit, we announced a new partnership between Microsoft and Anyscale, the company founded by Ray’s creators, to bring Anyscale’s managed Ray service to Azure as a first-party offering in private preview. This new managed service will deliver the simplicity of Anyscale’s developer experience on top of Azure’s enterprise-grade Kubernetes infrastructure, making it possible to run distributed Python workloads with native integrations, unified governance, and streamlined operations, all inside your Azure subscription.

Ray: Open-Source Distributed Computing for Python
Ray reimagines distributed systems for the Python ecosystem, making it simple for developers to scale code from a single laptop to a large cluster with minimal changes. Instead of rewriting applications for distributed execution, Ray offers Pythonic APIs that allow functions and classes to be transformed into distributed tasks and actors without altering core logic. Its smart scheduling seamlessly orchestrates workloads across CPUs, GPUs, and heterogeneous environments, ensuring efficient resource utilization.

Developers can also build complete AI systems using Ray’s native libraries—Ray Train for distributed training, Ray Data for data processing, Ray Serve for model serving, and Ray Tune for hyperparameter optimization—all fully compatible with frameworks like PyTorch and TensorFlow. By abstracting away infrastructure complexity, Ray lets teams focus on model performance and innovation.

Anyscale: Enterprise Ray on Azure
Ray makes distributed computing accessible; Anyscale running on Azure takes it to the next level for enterprise-readiness. At the heart of this offering is RayTurbo, Anyscale’s high-performance runtime for Ray. RayTurbo is designed to maximize cluster efficiency and accelerate Python workloads, enabling teams on Azure to:

Spin up Ray clusters in minutes, without Kubernetes expertise, directly from the Azure portal or CLI.
Dynamically allocate tasks across CPUs, GPUs, and heterogeneous nodes, ensuring efficient resource utilization and minimizing idle time.
Easily run large experiments quickly and cost-effectively with elastic scaling, GPU packing, and native support for Azure spot VMs.
Run reliably at production scale with automatic fault recovery, zero-downtime upgrades, and integrated observability.
Maintain control and governance; clusters run inside your Azure subscription, so data, models, and compute stay secure, with unified billing and compliance under Azure standards.
By combining Ray’s flexible APIs with Anyscale’s managed platform and RayTurbo’s performance, Python developers can move from prototype to production faster, with less operational overhead, and at cloud scale on Azure.

Kubernetes for Distributed Computing
Under the hood, Azure Kubernetes Service (AKS) powers this new managed offering, providing the infrastructure foundation for running Ray at production scale. AKS handles the complexity of orchestrating distributed workloads while delivering the scalability, resilience, and governance that enterprise AI applications require.

AKS delivers:

Dynamic resource orchestration: Automatically provision and scale clusters across CPUs, GPUs, and mixed configurations as demand shifts.
High availability: Self-healing nodes and failover keep workloads running without interruption.
Elastic scaling: scale from development clusters to production deployments spanning hundreds of nodes.
Integrated Azure services: Native connections to Azure Monitor, Microsoft Entra ID, Blob Storage, and policy tools streamline governance across IT and data science teams.
AKS gives Ray and Anyscale a strong foundation—one that’s already trusted for enterprise workloads and ready to scale from small experiments to global deployments.

Enabling teams with Anyscale running on Azure
With this partnership, Microsoft and Anyscale are bringing together the best of open-source Ray, managed cloud infrastructure, and Kubernetes orchestration. By pairing Ray’s distributed computing platform for Python with Anyscale’s management capabilities and AKS’s robust orchestration, Azure customers gain flexibility in how they can scale AI workloads. Whether you want to start small with rapid experimentation or run mission-critical systems at global scale, this offering gives you the choice to adopt distributed computing without the complexity of building and managing infrastructure yourself.

You can leverage Ray’s open-source ecosystem, integrate with Anyscale’s managed experience, or combine both with Azure-native services, all within your subscription and governance model. This optionality means teams can choose the path that best fits their needs: prototype quickly, optimize for cost and performance, or standardize for enterprise compliance.

Together, Microsoft and Anyscale are removing operational barriers and giving developers more ways to innovate with Python on Azure, so they can move faster, scale smarter, and focus on delivering breakthroughs. Read the full release here.

Get started
Learn more about the private preview and how to request access at https://aka.ms/anyscale or subscribe to Anyscale in the Azure Marketplace.

The post Powering Distributed AI/ML at Scale with Azure and Anyscale appeared first on Microsoft Azure Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Azure Anyscale Ray 分布式计算 AI/ML Kubernetes Python Cloud Computing Distributed Computing AI/ML Python
相关文章