philschmid RSS feed 09月30日 19:10
使用Terraform部署LLM到SageMaker
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

将开源LLM部署到生产环境通常需要数据科学家、机器学习工程师和DevOps团队协调。基础设施即代码(IaC)工具如Terraform简化了这一过程,确保部署一致性、可扩展性和自动化。Terraform LLM SageMaker模块可轻松将Hugging Face模型(如Llama 3)部署到Amazon SageMaker实时端点,管理IAM角色、模型、端点配置和自动扩展,显著降低部署复杂度。

💡 基础设施即代码(IaC)通过代码管理基础设施,确保每次部署环境一致,支持版本控制和可扩展性,自动化减少人为错误。

🔧 Terraform LLM SageMaker模块简化从Hugging Face部署LLM到Amazon SageMaker的过程,自动创建IAM角色、模型、端点配置和自动扩展。

📈 示例配置使用`philschmid/llm-sagemaker/aws`模块部署Llama 3.1-8B-Instruct,支持自定义实例类型(如ml.g5.2xlarge)和推理参数(如max_input_tokens)。

🛠️ 部署流程包括初始化Terraform环境、配置`main.tf`文件定义资源、执行`terraform apply`创建端点,并使用Python SDK(boto3)测试推理。

🗑️ 通过`terraform destroy`可清理所有创建的资源,模块设计符合IaC理念,适合企业将模型集成到现有应用和基础设施定义中。

Deploying open LLMs into production environments can often be a complex process that requires coordination between data scientists, machine learning engineers, and DevOps teams. Traditionally, data scientists or ML engineers focus on model development, while the deployment and are not always responsible for—or experienced in —deploying LLMs to production. This is where Infrastructure as Code (IaC) tools like Terraform come into play.

The Importance of Infrastructure as Code

“Infrastructure as Code (IaC) is the managing and provisioning of infrastructure through code instead of through manual processes. With IaC, configuration files are created that contain your infrastructure specifications, which makes it easier to edit and distribute configurations. It also ensures that you provision the same environment every time.” - Red Hat

IaC ensures:

    Consistency: By defining infrastructure in code, we ensure that every deployment is identical, eliminating the "it works on my machine" problem.Version Control: Infrastructure configurations can be versioned, allowing for easy rollbacks and collaborative development.Scalability: IaC makes it simple to replicate environments for testing or scaling purposes.Automation: Deployments can be automated, reducing human error and speeding up the process.

Terraform LLM SageMaker Module

The Terraform LLM SageMaker Module simplifies the process of deploying open LLMs from Hugging Face to Amazon SageMaker real-time endpoints.

It handles the creation of all necessary resources, including:

    IAM roles (if not provided)SageMaker ModelSageMaker Endpoint ConfigurationSageMaker EndpointAutoscaling

With this module, you can easily deploy popular models like Llama 3, Mistral, Mixtral, and Command from Hugging Face to Amazon SageMaker.

 module "sagemaker-huggingface" {  source               = "philschmid/llm-sagemaker/aws"  version              = "0.1.0"  endpoint_name_prefix = "llama3"  hf_model_id          = "meta-llama/Meta-Llama-3.1-8B-Instruct"  hf_token             = "YOUR_HF_TOKEN_WITH_ACCESS_TO_THE_MODEL"  instance_type        = "ml.g5.2xlarge"  instance_count       = 1 # default is 1   tgi_config = {    max_input_tokens       = 4000    max_total_tokens       = 4096    max_batch_total_tokens = 6144  }} 

Deploy Llama 3 with Terraform

Before we get started, make sure you have the Terraform installed and configured, as well as access to AWS Credentials to create the necessary services.

Create a new Terraform configuration

Each Terraform configuration must be in its own directory including a main.tf file. Our first step is to create the llama-terraform directory with a main.tf file.

mkdir llama-terraformtouch llama-terraform/main.tfcd llama-terraform

This configuration will deploy the Llama 3 model to a SageMaker endpoint, handling all the necessary setup behind the scenes.

Initialize the AWS provider and our module

Next, we open the main.tf in a text editor and add the aws provider as well as our module.

Note: the snippet below assumes that you have an AWS profile default configured with the needed permissions

provider "aws" {  profile = "default"  region  = "us-east-1"}  module "sagemaker-huggingface" {  source               = "philschmid/llm-sagemaker/aws"  version              = "0.1.0"  endpoint_name_prefix = "llama3"  hf_model_id          = "meta-llama/Meta-Llama-3.1-8B-Instruct"  hf_token             = "YOUR_HF_TOKEN_WITH_ACCESS_TO_THE_MODEL"  instance_type        = "ml.g5.2xlarge"  instance_count       = 1# default is 1   tgi_config = {    max_input_tokens       = 4000    max_total_tokens       = 4096    max_batch_total_tokens = 6144  }} output "endpoint_name" {  value = module.sagemaker-huggingface.sagemaker_endpoint_name}

Note: Make sure to replace the YOUR_HF_TOKEN_WITH_ACCESS_TO_THE_MODEL with a valid Hugging Face Token that has access to Llama 3.1.

When we create a new configuration — or check out an existing configuration from version control — we need to initialize the directory with terraform init. Initializing will download and install our AWS provider as well as the sagemaker-llm module.

terraform init # Initializing the backend...# Initializing modules...# Downloading registry.terraform.io/philschmid/llm-sagemaker/aws 0.1.0 for sagemaker-huggingface...# - sagemaker-huggingface in .terraform/modules/sagemaker-huggingface # Initializing provider plugins...# - Finding hashicorp/aws versions matching "5.60.0"...# - Finding hashicorp/random versions matching "3.1.0"...

Deploy the Llama 3.1 8B instruct model

To deploy/apply our configuration we run terraform apply command. Terraform will then print out which resources are going to be created and ask us if we want to continue, which can we confirm with yes.

Now Terraform will deploy our model to Amazon SageMaker as a real-time endpoint. This can take 5-10 minutes.

Test the endpoint and run inference

To test our deployed endpoint we can use the aws sdk in our example we are going to use the Python SDK (boto3), but you can easily switch this to use Java, Javascript, .NET, or Go SDK to invoke the Amazon SageMaker endpoint.

To be able to invoke our endpoint we need the endpoint name. You can get the endpoint name by inspecting the output of Terraform with terraform output endpoint_name or going to the SageMaker service in the AWS Management console.

We create a new file request.py with the following snippet.

Make sure you have configured your credentials (and region) correctly

import boto3import json client = boto3.client("sagemaker-runtime") ENDPOINT_NAME = "YOUR_ENDPOINT_NAME" body = {    "messages": [        {"role": "system", "content": "You are a helpful assistant."},        {"role": "user", "content": "What is deep learning?"},    ],    "top_p": 0.6,    "temperature": 0.9,    "max_tokens": 512,} response = client.invoke_endpoint(    EndpointName=ENDPOINT_NAME,    ContentType="application/json",    Accept="application/json",    Body=json.dumps(body),)response = json.loads(response["Body"].read().decode("utf-8"))print(response["choices"][0]["message"]["content"])

Now we can execute our request.

python3 request.py# Deep learning is a subset of machine learning that involves the use of artificial neural networks (ANNs) with multiple layers to analyze and interpret data. These neural networks are designed to mimic the structure and function of the human brain, with layers of interconnected nodes or "neurons" that process and transmit information.## The key characteristics of deep learning are:# # 1. Multiple layers: Deep learning models typically consist of multiple layers of interconnected nodes, with each layer processing and transforming the input data in some way.# 2. Neural networks: Deep learning models are based on artificial neural networks, which are designed to mimic the structure and function of the human brain.# 3. Non-linear transformations: Deep learning models use non-linear transformations to process and transform the input data, allowing them to learn complex patterns and relationships.# 4. Training with large datasets: Deep learning models require large amounts of data to train, as they need to learn from a vast number of examples to develop their predictive capabilities.

Destroy the infrastructure

To clean up our created resources we can run terraform destroy, which will delete all the created resources from the module.

Conclusion

The llm-sagemaker terraform module abstracts away the heavy lifting for deploying open LLMs to Amazon SageMaker away, which enables controlled, consistent and understandable managed deployments after concepts of IaC. This should help companies to move faster and include deployed models to Amazon SageMaker into their existing Applications and IaC definitions.

Give it a try and tell me know what you think about the module. Its still a very basic module. If you have feature requests please open an issue.


Thanks for reading! If you have any questions or feedback, please let me know on Twitter or LinkedIn.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Terraform SageMaker LLM部署 基础设施即代码 HuggingFace 机器学习
相关文章