philschmid RSS feed 09月30日
使用AWS CDK部署Llama 2
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何使用AWS Cloud Development Kit (AWS CDK)结合Hugging Face LLM CDK Construct来部署和管理局域语言模型Llama 2。通过初始化CDK项目、安装Hugging Face LLM CDK Construct、添加LLM资源并部署Llama 2,以及运行推理和测试模型,展示了如何将LLM集成到生产环境中。AWS CDK简化了云基础设施的管理,使开发者能够更高效地部署和管理LLM。

🔧 使用AWS CDK初始化和引导一个新的CDK项目,为部署LLM创建必要的初始资源。AWS CDK通过代码定义、提供和管理AWS云基础设施,简化了部署流程。

📦 安装Hugging Face LLM CDK Construct,该Construct基于Hugging Face LLM Inference DLC,是一个开放代码的解决方案,专门用于部署和 Serving 大型语言模型(LLMs)。它利用aws-sagemaker抽象了复杂的部署工作。

🌐 添加LLM资源并部署Llama 2,通过HuggingFaceLlm构造函数定义模型ID、GPU数量和自定义参数,所有环境变量将被传递到容器中。部署过程中,CDK会合成资源并转换为AWS CloudFormation模板。

🚀 运行推理和测试模型,使用SageMaker Python SDK创建HuggingFacePredictor类,通过API发送请求到部署的端点,验证模型的功能和性能。支持多种推理参数,如max_new_tokens、temperature等。

🔗 结合Amazon SageMaker的实时端点,开发者可以通过AWS SDK、SageMaker Python SDK或AWS CLI与模型交互,实现快速集成和测试,帮助DevOps团队将LLM集成到产品中。

Open Large Language models (LLMs), like Llama 2 or Falcon, are rapidly shifting the thinking of what we can achieve with AI. Those new open LLMs will enable several new business use cases or improve/optimize existing ones.

However, deploying and managing LLMs in production requires specialized infrastructure and workflows. In this blog, we'll show you how to use Infrastructure as Code with AWS Cloud Development Kit (AWS CDK) to deploy and manage Llama 2. The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework that allows you to use code to define, provision, and manage your cloud infrastructure on AWS.

What you are going to do:

    Initialize and bootstrap a new CDK projectInstall the Hugging Face LLM CDK ConstructAdd LLM resource and deploy Llama 2Run inference and test the model

Before you get started, make sure you have the AWS CDK installed and configured your AWS credentials.

1. Initialize and bootstrap a new CDK project

Deploying applications using the CDK may require additional resources for CDK to store for example assets. The process of provisioning these initial resources is called bootstrapping. So before being able to deploy our application, you need to make sure that you bootstrapped your project. Create a new empty directory and then initialize and bootstrap the project

# create new directorymkdir huggingface-cdk-example && cd huggingface-cdk-example# initialize projectcdk init app --language typescript# bootstrapcdk bootstrap

The cdk init command creates files and folders inside the huggingface-cdk-example directory to help you organize the source code for your AWS CDK app. The bin/ directory contains our app with an empty stack which is located under the lib/ directory.

2. Installing the Hugging Face LLM CDK Construct

We created a new AWS CDK construct aws-sagemaker-huggingface-llm, to make the deployment of LLMs easier than ever before. The construct uses the Hugging Face LLM Inference DLC, built on top of Text Generation Inference (TGI), an open-code, purpose-built solution for deploying and serving Large Language Models (LLMs).

The aws-sagemaker-huggingface-llm leverages aws-sagemaker and abstracts all of the heavy liftings away. You can install the construction using npm.

npm install aws-sagemaker-huggingface-llm

3. Add LLM resource and deploy Llama 2

A new CDK project is always empty in the beginning because the stack it contains doesn't define any resources. Let's a HuggingFaceLlm resource. Therefore you need to open your stack in the lib/ directory and import the HuggingFaceLlm into it.

import * as cdk from 'aws-cdk-lib'import { Construct } from 'constructs'import { HuggingFaceLlm } from 'aws-sagemaker-huggingface-llm' export class HuggingfaceCdkExampleStack extends cdk.Stack {  constructor(scope: Construct, id: string, props?: cdk.StackProps) {    super(scope, id, props)    // create new LLM SageMaker Endpoint    new HuggingFaceLlm(this, 'Llama2Llm', {      name: 'llama2-chat',      instanceType: 'ml.g5.2xlarge',      environmentVariables: {        HF_MODEL_ID: 'NousResearch/Llama-2-7b-chat-hf',        SM_NUM_GPUS: '1',        MAX_INPUT_LENGTH: '2048',        MAX_TOTAL_TOKENS: '4096',        MAX_BATCH_TOTAL_TOKENS: '8192',      },    })  }}

The construct also provides an interface for the available arguments called HuggingFaceLlmProps, where you can define your Model id, the number of GPUs to shard the model, and custom parameters. All environmentVariables will be passed to the container.

Note: The HuggingfaceLlm contains the Sagemaker Endpoint as endpoint property. Meaning that you can easily add autoscaling, monitoring, or alerts.

Before you deploy the stack, make sure the code is validated by synthesizing it using cdk.

The cdk synth command executes your app, which causes the resources it defines to be translated into an AWS CloudFormation template.

To deploy the stack, you can use the deploy command from cdk.

AWS CDK will now synthesize our stack again and potentially ask us to confirm our changes. CDK will also list the IAM statements which will be created. Confirm with y. Now CDK will create all required resources for Amazon SageMaker and deploy our model. Once our endpoint is up and running, the deploy command should be finished, and you should see the name our your endpoint. Example below

Outputs:HuggingfaceCdkExampleStack.Llama2LlmEndpointNameBD92F39C = llama2-chat-endpoint-1h7s2afii09310d4d605026Stack ARN:arn:aws:cloudformation:us-east-1:558105141721:stack/HuggingfaceCdkExampleStack/484a4770-3b3f-11ee-95f2-0eabb10b55f3

4. Run inference and test the model

The aws-sagemaker-huggingface-llm construct is built on top of Amazon SageMaker. This means that the construct creates a real-time endpoint for us. To run inference, you can either use the AWS SDK (in any language), the sagemaker Python SDK or the AWS CLI. To keep things simple, use the SageMaker Python SDK.

If you haven’t installed it, you can install it with pip install sagemaker. The sagemaker SDK implements a HuggingFacePredictor class which makes it super easy for us to send requests to your endpoint.

from sagemaker.huggingface import HuggingFacePredictor # create predictorpredictor = HuggingFacePredictor("YOUR ENDPOINT NAME") # llama2-chat-endpoint-1h7s2afii09310d4d605026 # run inferencepredictor.predict({"inputs": "Can you tell me something about AWS CDK?"})

Since the construct uses the Hugging Face LLM Inference DLC, you can use the same parameters for inference, including max_new_tokens, temperature, top_p etc. You can find a list of supported arguments and how to prompt Llama 2 correctly in the Deploy Llama 2 7B/13B/70B on Amazon SageMaker blog post under Run inference and chat with the model. To validate that it works, you can test it with.

# hyperparameters for llmprompt = f"""<s>[INST] <<SYS>>You are an AWS Expert<</SYS>> Should I rather use AWS CDK or Terraform? [/INST]"""payload = {  "inputs": prompt,  "parameters": {    "do_sample": True,    "top_p": 0.6,    "temperature": 0.9,    "top_k": 50,    "max_new_tokens": 512,    "repetition_penalty": 1.03,    "stop": ["</s>"]  }} # send request to endpointresponse = predictor.predict(payload) print(response[0]["generated_text"][len(prompt):])

Thats it! You made it. Now you can go to your DevOps team and help them integrate LLMs into your products.

Conclusion

In this post, we demonstrated how Infrastructure as Code with AWS CDK enables the productive use of large language models like Llama 2 in production. We showed how the aws-sagemaker-huggingface-llm helps to deploy Llama 2 to SageMaker with minimal code.


Thanks for reading! If you have any questions, feel free to contact me on Twitter or LinkedIn.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AWS CDK Hugging Face LLM Llama 2 SageMaker LLM部署 Infrastructure as Code
相关文章