Nvidia Developer 09月03日
使用Azure Container Apps部署分布式GPU加速Apache Spark应用
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

将大量文本库转换为数值表示的嵌入对于生成式AI至关重要。本文介绍了如何使用Azure Container Apps(ACA)在服务器端GPU上部署分布式Apache Spark应用程序,以生成嵌入。该解决方案包括一个CPU运行的Spark前端控制器应用程序和一个或多个GPU加速的Spark工作应用程序。两者通过Azure文件共享进行通信,并通过Azure虚拟网络进行连接。该架构简单而强大,允许Spark管理大型数据集,同时ACA抽象了计算管理的复杂性。示例中使用了包装了NVIDIA RAPIDS加速器和Hugging Face模型的GPU加速工作容器。这种服务器端方法消除了基础设施管理的负担,并允许动态扩展GPU资源以满足AI工作负载的需求。

💡 该解决方案通过在Azure Container Apps(ACA)上部署分布式Apache Spark应用程序,结合服务器端GPU,实现了大规模文本嵌入的生成。这种架构利用了Spark的分布式数据处理能力和ACA的简化计算管理特性,为生成式AI应用提供了高效且可扩展的解决方案。

🖥️ 架构主要由两个服务器端容器应用程序组成:一个CPU运行的Spark前端控制器应用程序,负责协调工作并通过端点提交作业;以及一个或多个GPU加速的Spark工作应用程序,负责执行实际的数据处理和AI模型推理。这种分工合作使得数据处理任务能够在GPU上高效执行,同时保持整体架构的简洁性。

📊 在工作流程方面,该解决方案支持从交互式Jupyter开发环境无缝过渡到基于触发器的生产就绪系统。通过设置APP_MODE环境变量,用户可以选择使用Jupyter进行开发和调试,或使用HTTP/HTTPS API进行生产工作负载的自动化。这种灵活性使得开发人员可以根据需求选择最合适的工作模式,而无需进行复杂的架构调整。

🔧 示例中使用了GPU加速的工作容器,该容器包装了NVIDIA RAPIDS加速器和Hugging Face的预训练嵌入模型(all-MiniLM-L6-v2)。这种预构建的容器简化了开发过程,并提供了高性能的GPU加速数据处理能力。此外,该解决方案还支持使用NVIDIA NIM微服务来替代自定义工作容器,从而为生产环境提供更强大的性能和支持。

📈 在成本效益方面,该解决方案采用了服务器端方法,允许根据需求动态扩展GPU资源,从而避免了传统始终在线集群的高昂成本。通过ACA的自动扩展功能,用户可以在需要时快速增加GPU实例数量,并在任务完成后将其缩减至零,从而实现了显著的成本节约。

The process of converting vast libraries of text into numerical representations known as embeddings is essential for generative AI. Various technologies—from semantic search and recommendation engines to retrieval-augmented generation (RAG)—depend on embeddings to transform data so LLMs and other models can understand and process it. 

Yet generating embeddings for millions or billions of documents requires processing at a massive scale. Apache Spark is the go-to framework for this challenge, expertly distributing large-scale data processing jobs across a cluster of machines. However, while Spark solves for scale, generating embeddings itself is computationally intensive. Accelerating these jobs for timely results requires accelerated computing, which introduces the complexity of provisioning and managing the underlying GPU infrastructure.

This post demonstrates how to solve this challenge by deploying a distributed Spark application on Azure Container Apps (ACA) with serverless GPUs. This powerful combination allows Spark to expertly orchestrate massive datasets while ACA completely abstracts away the complexity of managing and scaling the compute. In this example, a specialized worker container is built that packages high-performance libraries like the NVIDIA RAPIDS Accelerator for Spark with an open source model from Hugging Face, creating a flexible and scalable solution.

The result is a serverless, pay-per-use platform that delivers high throughput and low latency for demanding AI and data processing applications. This approach provides a powerful template that can be readily adapted. For enterprise deployments seeking maximum performance and support, the architecture can be upgraded by replacing the custom-built worker with an NVIDIA NIM microservice.

Build a serverless, distributed GPU-accelerated Apache Spark application

The architecture for this solution is straightforward yet powerful, consisting of just two main serverless container applications. Deployed in an Azure Container Apps Environment, these applications work in concert.

At its core, the architecture features:

    Apache Spark front-end controller (master) application: Runs on a CPU and orchestrates the work. It also provides an endpoint for submitting jobs, which can be a Jupyter interface for development or an HTTP trigger for production workloads.One or more Spark worker applications: These applications are GPU-accelerated and run on Azure Container Apps serverless GPUs. They perform the heavy lifting of data processing and can be automatically scaled out to handle a large number of requests.Shared data storage layer: Using Azure Files, this layer allows the controller and workers to share code, models, and data, simplifying development and deployment.

This setup is designed for both performance and convenience, enabling you to build and test complex distributed applications with ease.

Figure 1. Azure Container Apps environment application implementation

Prerequisites

Before you begin, ensure that you have the following:

Step 1: Set up the Apache Spark controller application

First, deploy the Spark frontend application. The main goal of the controller application is to tell the Spark worker nodes what tasks to perform and host a web service to receive requests.

Build container images

This application has two Docker containers that work together, one for Spark (master part) and one for the frontend (interact). The Spark container has the SPARK_LOCAL_IP environment variable set to 0.0.0.0 to ensure it can accept connections from worker nodes on any network. 

The frontend container has a SPARK_MASTER_URL variable, which is set to the application URL using port 7077. In addition, it has an APP_MODE environment variable that allows you to switch how you interact with the Spark master. You can use Jupyter for development and debugging, and HTTP/HTTPS for API mode.

You can add your customization to the dockerfile.master and dockerfile.interact provided in the GitHub repository, build the container image, and push it to the Azure Container Registry (ACR).

Create the controller application

Now create the controller application based on the two images (Spark master and client interaction) from the ACR. To use GPU acceleration, create a GPU workload profile based on available NVIDIA GPU sizes. Note that NVIDIA A100 and NVIDIA T4 GPUs are currently supported by Azure Container Apps. 

Create an Azure Container Apps environment with an Azure Virtual Network with public access and attach the GPU workload profile to it. This environment should add Azure File Share as a volume mount for storing input data, writing output results, and sharing files between the Spark controller and worker nodes.

Configure networking and scaling

Configure the application to accept public web traffic on port 8888 for its REST API endpoint by enabling ingress and setting the port number. You must also add an additional TCP port 7077 to allow the Spark nodes to communicate with each other. For debugging, you can optionally expose the Spark UI on other ports, such as 8080. This controller application must always be set to a scale of one.

Step 2: Deploy the GPU-accelerated Spark worker application

The next step is to deploy the Spark worker application. The worker application will connect to the master and is responsible for performing the actual data processing and AI model inference. Unlike the single controller, you can deploy many workers that will automatically connect to it.

Build the worker container image

This particular example uses Dockerfile.worker to build a worker container that packages a foundational NVIDIA base image, the NVIDIA RAPIDS Accelerator for Apache Spark library and application code that loads a pretrained open source embedding model (all-MiniLM-L6-v2) from Hugging Face. The RAPIDS Accelerator is a library that provides drop-in acceleration for Spark with no code change, leveraging GPUs to dramatically speed up data processing and traditional machine learning tasks.

You can then push this worker image to your ACR, ready for deployment. It should share the same Azure Virtual Network and storage as the controller application.

Note that while this example shows how to build a worker container with a Hugging Face embedding model, for production deployments you can use NVIDIA NIM microservices. NVIDIA NIM microservices supply pre-built, enterprise-supported, production-grade containers featuring state-of-the-art AI models and would offer the best inference performance on NVIDIA GPUs. 

For example, NVIDIA NIM microservices are available for powerful embedding models like the NV-Embed-QA family, which is purpose-built for high-performance retrieval and question answering tasks.

Create the worker application

Create the worker application using the worker container image built, as outlined in the previous section. Deploy the application within an Azure Container Apps (ACA) environment using a serverless GPU workload profile. ACA automatically handles the setup of NVIDIA drivers and the CUDA toolkit, so you can focus on your application code instead of the infrastructure.

Automatic scaling

Unlike the controller (which has a fixed scale of one), the number of worker instances is dynamic. When a large data processing job starts, ACA can automatically scale the number of worker instances based on the load. It can rapidly scale from zero to many GPU instances in minutes to meet demand and, crucially, scale back to zero to avoid paying for idle resources. This approach allows for significant cost savings compared to traditional, always-on clusters.

The worker application doesn’t have Ingress. Everything is based on internal communication. 

Step 3: Run the distributed text-embedding job

With the Spark controller and worker applications running, your serverless GPU-accelerated Apache Spark cluster is active and ready to process data. In this example, product description data is processed from a SQL Server, text embeddings are generated using the Hugging Face model accelerated by the worker GPUs, and the results are written back to the SQL Server database. The method for submitting the job differs depending on the controller APP_MODE environment variable, as described in Step 1.

Connect using Jupyter (development mode)

If the controller application environment variable APP_MODE is set to jupyter you can navigate to the public URL of your controller application, which will automatically have a Jupyter notebook connected to your shared Azure File storage. From the notebook, you can create a Spark session, connect to your SQL Server database through JDBC, and execute the embedding job. For an example, see the spark-embedding.py Jupyter notebook in the NVIDIA/GenerativeAIExamples GitHub repo.

You can also monitor the job’s progress through the standard Spark UI, where you will see the workers being utilized and CUDA being used for processing on the serverless GPU.

Trigger using HTTP (production mode)

To enable production mode, set the controller APP_MODE environment variable to trigger, which exposes a secure HTTP endpoint for automation. The embedding job can then be initiated through a scheduled Bash script or directly from a database using SQL Server external REST endpoint feature. 

Once triggered, the controller fully automates the end-to-end workflow. This workflow includes reading the data from your SQL table, distributing the processing to the GPU workers, and writing the final embeddings back to your destination table in a robust, hands-off manner.

For an example of triggering a job in production mode, see the trigger-mode.py file. Note that you will need to replace the placeholder URL with the actual public URL of your deployed controller application.

Get started with serverless distributed data processing

By deploying a custom, GPU-accelerated Apache Spark application on Azure Container Apps serverless GPUs, you can build highly efficient, scalable, and cost-effective distributed data processing solutions. This serverless approach removes the burden of infrastructure management and enables you to dynamically scale powerful GPU resources to meet the demands of your AI workloads.

The ability to move from an interactive Jupyter-based development environment to a production-ready, trigger-based system within the same architecture provides immense flexibility. This powerful combination of open source tools and serverless infrastructure offers a clear path to production for your most demanding data challenges. You can further optimize by adopting NVIDIA NIM microservices for enterprise-grade performance and support.

To get started deploying the solution, check out the code available through the NVIDIA/GenerativeAIExamples GitHub repo. To learn more and see a demo walkthrough, watch Secure Next-Gen AI Apps with Azure Container Apps Serverless GPUs.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Apache Spark Azure Container Apps GPU加速 文本嵌入 生成式AI
相关文章