philschmid RSS feed 09月30日
利用AWS CDK简化Hugging Face模型部署
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何利用AWS Cloud Development Kit (AWS CDK) 结合Hugging Face Inference DLC,将Hugging Face Hub上的Transformer模型自动化部署到AWS云端,并提供安全可访问的API。文章详细阐述了从选择模型、引导CDK项目、部署模型到SageMaker,再到测试API的全过程。通过代码即基础设施的方式,极大地简化了将机器学习模型投入生产的复杂性,降低了项目失败率,使得数据科学家和研究人员能够更便捷地利用其模型。

🚀 **自动化模型部署流程**:文章展示了如何使用AWS CDK将Hugging Face Hub上的Transformer模型部署到Amazon SageMaker。CDK允许用户以现代编程语言(如Python)的形式定义和管理基础设施,从而实现模型部署的自动化,显著降低了模型上线生产环境的难度和时间成本。

💡 **Hugging Face Inference DLC赋能**:通过集成Hugging Face Inference Deep Learning Containers (DLC),用户无需编写复杂的推理代码即可轻松部署模型。这使得模型服务变得更加便捷,特别是对于Transformer系列模型,可以直接利用Hugging Face提供的优化容器进行部署,大大提高了效率。

🔒 **安全可扩展的API构建**:该方案结合了Amazon API Gateway和AWS Lambda,为部署的模型构建了一个安全且易于访问的API。AWS Lambda作为客户端代理,与SageMaker Endpoint进行交互,确保了数据传输的安全性和接口的稳定性,使得部署好的模型能够被任何应用程序、服务或前端安全地调用。

📊 **解决模型上线难题**:文章指出,许多数据科学项目面临无法部署到生产环境的挑战。通过AWS CDK和Hugging Face的合作,旨在解决这一痛点,将机器学习模型的上线成功率从普遍的87%-90%降低,让更多模型能够真正投入实际应用,发挥其价值。

Researchers, Data Scientists, Machine Learning Engineers are excellent at creating models to achieve new state-of-the-art performance on different tasks, but deploying those models in an accessible, scalable, and secure way is more of an art than science. Commonly, those skills are found in software engineering and DevOps. Venturebeat reports that 87% of data science projects never make it to production, while redapt claims it to be 90%.

We partnered up with AWS and the Amazon SageMaker team to reduce those numbers. Together we built 🤗 Transformers optimized Deep Learning Containers to accelerate and secure training and deployment of Transformers-based models. If you want to know more about the collaboration, take a look here.

In this blog, we are going to use the AWS Cloud Development Kit (AWS CDK) to create our infrastructure and automatically deploy our model from the Hugging Face Hub to the AWS Cloud. The AWS CDK uses the expressive of modern programming languages, like Python to model and deploy your applications as code. In our example, we are going to build an application using the Hugging Face Inference DLC for model serving and Amazon API Gateway with AWS Lambda for building a secure accessible API. The AWS Lambda will be used as a client proxy to interact with our SageMaker Endpoint.

If you’re not familiar with Amazon SageMaker: “Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. SageMaker removes the heavy lifting from each step of the machine learning process to make it easier to develop high-quality models.” [REF]

You find the complete code for it in this Github repository.


Tutorial

Before we get started, make sure you have the AWS CDK installed and configured your AWS credentials.

What are we going to do:

    selecting a model from the Hugging Face Hubbootstrap our CDK projectDeploy the model using CDKRun inference and test the API

For those of you who don't what the Hugging Face Hub is you should definitely take a look here. But the TL;DR; is that the Hugging Face Hub is an open community-driven collection of state-of-the-art models. At the time of writing the blog post, we have 17,501 available free models to use.

To select the model we want to use we navigate to hf.co/models then pre-filter using the task on the left, e.g. summarization. For this blog post, I went with the sshleifer/distilbart-cnn-12-6, which was fine-tuned on CNN articles for summarihttps://www.philschmid.de/static/blog/huggingface-transformers-cdk-sagemaker-lambda/hub.png-sagemaker-lambda/hub.png" alt="Hugging Face Hub">

2. Bootstrap our CDK project

Deploying applications using the CDK may require additional resources for CDK to store for example assets. The process of provisioning these initial resources is called bootstrapping. So before being able to deploy our application, we need to make sure that we bootstrapped our project.

3. Deploy the model using CDK

Now we are able to deploy our application with the whole infrastructure and deploy our previous selected Transformer sshleifer/distilbart-cnn-12-6 to Amazon SageMaker. Our application uses the CDK context to accept dynamic parameters for the deployment. We can provide our model with as key model and our task as key task . The application allows further configuration, like passing a different instance_type when deploying. You can find the whole list of arguments in the repository.

In our case we will provide model=sshleifer/distilbart-cnn-12-6 and task=summarization with the a GPU instance instance_type=ml.g4dn.xlarge.

cdk deploy \  -c model="sshleifer/distilbart-cnn-12-6" \  -c task="summarization" \  -c instance_type="ml.g4dn.xlarge"

After running the cdk deploy command we will get an output of all resources, which are going to be created. We then confirm our deployment and the CDK will create all required resources, deploy our AWS Lambda function and our Model to Amazon SageMaker. This takes around 3-5 minutes.

After the deployment the console output should look similar to this.

 ✅  HuggingfaceSagemakerEndpoint Outputs:HuggingfaceSagemakerEndpoint.hfapigwEndpointE75D67B4 = https://r7rch77fhj.execute-api.us-east-1.amazonaws.com/prod/ Stack ARN:arn:aws:cloudformation:us-east-1:558105141721:stack/HuggingfaceSagemakerEndpoint/6eab9e10-269b-11ec-86cc-0af6d09e2aab

4. Run inference and test the API

After the deployment is successfully complete we can grap our Endpoint URL HuggingfaceSagemakerEndpoint.hfapigwEndpointE75D67B4 from the CLI output andhttps://www.philschmid.de/static/blog/huggingface-transformers-cdk-sagemaker-lambda/request.pngface-transformers-cdk-sagemaker-lambda/request.png" alt="insomnia request">

the request as curl to copy

curl --request POST \  --url https://r7rch77fhj.execute-api.us-east-1.amazonaws.com/prod/ \  --header 'Content-Type: application/json' \  --data '{    "inputs": "Hugging Face, the winner of VentureBeat’s Innovation in Natural Language Process/Understanding Award for 2021, is looking to level the playing field. The team, launched by Clément Delangue and Julien Chaumond in 2016, was recognized for its work in democratizing NLP, the global market value for which is expected to hit $35.1 billion by 2026. This week, Google’s former head of Ethical AI Margaret Mitchell joined the team. Hugging Face is also knee-deep in a project called BigScience, an international, multi-company, multi-university research project with over 500 researchers, designed to better understand and improve results on large language models."}'

Conclusion

With the help of the AWS CDK we were able to deploy all required Infrastructure for our API by defining them in a programmatically familiar language we know and use. The Hugging Face Inference DLC allowed us to deploy a model from the Hugging Face Hub, with out writing a single line of inference code and we are now able to use our public exposed API in securely in any applications, service or frontend we want.

To optimize the solution you can tweek the CDK template to your needs, e.g. add a VPC to the AWS Lambda and the SageMaker Endpoint to accelerate communication between those two.


You can find the code here and feel free open a thread the forum.

Thanks for reading. If you have any questions, feel free to contact me, through Github, or on the forum. You can also connect with me on Twitter or LinkedIn.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AWS CDK Hugging Face SageMaker 模型部署 机器学习 AI Cloud Deployment Transformer Models API Gateway AWS Lambda
相关文章