AWS Machine Learning Blog 09月12日
使用AWS CDK自动化部署SageMaker私有工作团队
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文详细介绍了如何利用AWS Cloud Development Kit (AWS CDK)和AWS CloudFormation自动化部署Amazon SageMaker Ground Truth的私有工作团队,并配置专用的Amazon Cognito用户池。文章着重解决了Amazon Cognito用户池与SageMaker私有工作团队之间存在的相互依赖性技术挑战,通过自定义CloudFormation资源实现了资源的有序创建和配置。解决方案概述了详细的架构,解释了从用户池、应用客户端到私有工作团队的创建流程,并说明了工作团队成员的登录和访问流程。文中还提供了部署和测试的详细步骤,以及安全加固和集成CI/CD等最佳实践,旨在帮助用户更高效、安全地构建和管理ML数据标注工作流。

🎯 **自动化部署SageMaker私有工作团队**:文章提供了一个端到端的解决方案,利用AWS CDK和CloudFormation脚本来自动化创建和配置Amazon SageMaker Ground Truth的私有工作团队。这克服了手动部署的复杂性和易出错性,特别是解决了Amazon Cognito用户池和SageMaker私有工作团队之间的相互依赖性难题,确保了部署的一致性和效率。

🔒 **集成的安全与身份验证**:该方案的核心是构建一个集成的安全体系,将Amazon Cognito用户池与SageMaker私有工作团队紧密结合。Amazon Cognito负责用户管理、身份验证和多因素认证,并通过AWS WAF提供额外的安全防护。工作团队成员通过邮件邀请注册,并使用统一的登录体验访问标注门户,确保了数据标注过程的安全性。

⚙️ **解决技术依赖与优化部署**:文章深入探讨了在自动化部署过程中遇到的关键技术挑战,例如Amazon Cognito应用客户端的Callback URL和用户池域名的固定性。通过使用CloudFormation自定义资源和AWS CDK的编排能力,该方案巧妙地处理了这些相互依赖关系,确保了所有资源能够按正确的顺序创建和配置,从而实现了一个稳定可靠的部署流程。

🛠️ **灵活的定制与扩展性**:该解决方案不仅提供了基础的部署框架,还强调了其灵活性和可扩展性。文章建议了多种最佳实践,包括自定义域名、增强安全控制(如IP限制、VPC配置)、集成CI/CD流水线、以及扩展功能以管理工作团队成员和标注任务。这使得用户可以根据自身业务需求对解决方案进行深度定制,并将其无缝集成到现有的ML生命周期中。

Private workforces for Amazon SageMaker Ground Truth and Amazon Augmented AI (Amazon A2I) help organizations build proprietary, high-quality datasets while keeping high standards of security and privacy.

The AWS Management Console provides a fast and intuitive way to create a private workforce, but many organizations need to automate their infrastructure deployment through infrastructure as code (IaC) because it provides benefits such as automated and consistent deployments, increased operational efficiency, and reduced chances of human errors or misconfigurations.

However, creating a private workforce with IaC is not a straightforward task because of some complex technical dependencies between services during the initial creation.

In this post, we present a complete solution for programmatically creating private workforces on Amazon SageMaker AI using the AWS Cloud Development Kit (AWS CDK), including the setup of a dedicated, fully configured Amazon Cognito user pool. The accompanying GitHub repository provides a customizable AWS CDK example that shows how to create and manage a private workforce, paired with a dedicated Amazon Cognito user pool, and how to integrate the necessary Amazon Cognito configurations.

Solution overview

This solution demonstrates how to create a private workforce and a coupled Amazon Cognito user pool and its dependent resources. The goal is to provide a comprehensive setup for the base infrastructure to enable machine learning (ML) labeling tasks.

The key technical challenge in this solution is the mutual dependency between the Amazon Cognito resources and the private workforce.

Specifically, the creation of the user pool app client requires certain parameters, such as the callback URL, which is only available after the private workforce is created. However, the private workforce creation itself needs the app client to be already present. This mutual dependency makes it challenging to set up the infrastructure in a straightforward manner.

Additionally, the user pool domain name must remain consistent across deployments, because it can’t be easily changed after the initial creation and inconsistency in the name can lead to deployment errors.

To address these challenges, the solution uses several AWS CDK constructs, including AWS CloudFormation custom resources. This custom approach allows the orchestration of the user pool and SageMaker private workforce creation, to correctly configure the resources and manage their interdependencies.

The solution architecture is composed of one stack with several resources and services, some of which are needed only for the initial setup of the private workforce, and some that are used by the private workforce workers when logging in to complete a labeling task. The following diagram illustrates this architecture.

The solution’s deployment requires AWS services and resources that work together to set up the private workforce. The numbers in the diagram reflect the stack components that support the stack creation, which occur in the following order:

    Amazon Cognito user pool – The user pool provides user management and authentication for the SageMaker private workforce. It handles user registration, login, and password management. A default email invitation is initially set to onboard new users to the private workforce. The user pool is both associated with an AWS WAF firewall and configured to deliver user activity logs to Amazon CloudWatch for enhanced security. Amazon Cognito user pool app client – The user pool app client configures the client application that will interact with the user pool. During the initial deployment, a temporary placeholder callback URL is used, because the actual callback URL can only be determined later in the process. AWS Systems Manager Parameter Store Parameter Store, a capability of AWS Systems Manager, stores and persists the prefix of the user pool domain across deployments in a string parameter. The provided prefix must be such that the resulting domain is globally unique. Amazon Cognito user pool domain – The user pool domain defines the domain name for the managed login experience provided by the user pool. This domain name must remain consistent across deployments, because it can’t be easily changed after the initial creation. IAM rolesAWS Identity and Access Management (IAM) roles for CloudFormation custom resources include permissions to make AWS SDK calls to create the private workforce and other API calls during the next steps. Private workforce – Implemented using a custom resource backing the CreateWorkforce API call, the private workforce is the foundation to manage labeling activities. It creates the labeling portal and manages portal-level access controls, including authentication through the integrated user pool. Upon creation, the labeling portal URL is made available to be used as a callback URL by the Amazon Cognito app client. The connected Amazon Cognito app client is automatically updated with the new callback URL. SDK call to fetch the labeling portal domain – This SDK call reads the subdomain of labeling portal. This is implemented as a CloudFormation custom resource. SDK call to update user pool – This SDK call updates the user pool with a user invitation email that points to the labeling portal URL. This is implemented as a CloudFormation custom resource. Filter for placeholder callback URL – Custom logic separates the placeholder URL from the app client’s callback URLs. This is implemented as a CloudFormation custom resource, backed by a custom AWS Lambda function. SDK call to update the app client to remove the placeholder callback URL – This SDK call updates the app client with the correct callback URLs. This is implemented as a CloudFormation custom resource. User creation and invitation emails – Amazon Cognito users are created and sent invitation emails with instructions to join the private workforce.

After this initial setup, a worker can join the private workforce and access the labeling. The authentication flow includes the email invitation, initial registration, authentication, and login to the labeling portal. The following diagram illustrates this workflow.

The detailed workflow steps are as follows:

    A worker receives an email invitation that provides the user name, temporary password, and URL of the labeling portal. When trying to reach the labeling portal, the worker is redirected to the Amazon Cognito user pool domain for authentication. Amazon Cognito domain endpoints are additionally protected by AWS WAF. The worker then sets a new password and registers with multi-factor authentication. Authentication actions by the worker are logged and sent to CloudWatch. The worker can log in and is redirected to the labeling portal. In the labeling portal, the worker can access existing labeling jobs in SageMaker Ground Truth.

The solution uses a mix of AWS CDK constructs and CloudFormation custom resources to integrate the Amazon Cognito user pool and the SageMaker private workforce so workers can register and access the labeling portal. In the following sections, we show how to deploy the solution.

Prerequisites

You must have the following prerequisites:

Deploy the solution

To deploy the solution, complete the following steps. Make sure you have AWS credentials available in your environment with sufficient permissions to deploy the solution resources.

    Clone the GitHub repository. Follow the detailed instructions in the README file to deploy the stack using the AWS CDK and AWS CLI. Open the AWS CloudFormation console and choose the Workforce stack for more information on the ongoing deployment and the created resources.

Test the solution

If you invited yourself from the AWS CDK CLI to join the private workforce, follow the instructions in the email that you received to register and access the labeling portal. Otherwise, complete the following steps to invite yourself and others to join the private workforce. For more information, see Creating a new user in the AWS Management Console.

    On the Amazon Cognito console, choose User pools in the navigation pane. Choose the existing user pool, MyWorkforceUserPool. Choose Users, then choose Create a user. Choose Email as the alias attribute to sign in. Choose Send an email invitation as the invitation message. For User name, enter a name for the new user. Make sure not to use the email address. For Email address, enter the email address of the worker to be invited. For simplicity, choose Generate a password for the user. Choose Create.

After you receive the invitation email, follow the instructions to set a new password and register with an authenticator application. Then you can log in and see a page listing your labeling jobs.

Best practices and considerations

When setting up a private workforce, consider the best practices for Amazon Cognito and the AWS CDK, as well as additional customizations:

Clean up

To clean up your resources, open the AWS CloudFormation console and delete the Workforce stack. Alternatively, if you deployed using the AWS CDK CLI, you can run cdk destroy from the same terminal where you ran cdk deploy and use the same AWS CDK CLI arguments as during deployment.

Conclusion

This solution demonstrates how to programmatically create a private workforce on SageMaker Ground Truth, paired with a dedicated and fully configured Amazon Cognito user pool. By using the AWS CDK and AWS CloudFormation, this solution brings the benefits of IaC to the setup of your ML data labeling private workforce.

To further customize this solution to meet your organization’s standards, discover how to accelerate your journey on the cloud with the help of AWS Professional Services.

We encourage you to learn more from the developer guides on data labeling on SageMaker and Amazon Cognito user pools. Refer to the following blog posts for more examples of labeling data using SageMaker Ground Truth:


About the author

Dr. Giorgio Pessot is a Machine Learning Engineer at Amazon Web Services Professional Services. With a background in computational physics, he specializes in architecting enterprise-grade AI systems at the confluence of mathematical theory, DevOps, and cloud technologies, where technology and organizational processes converge to achieve business objectives. When he’s not whipping up cloud solutions, you’ll find Giorgio engineering culinary creations in his kitchen.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Amazon SageMaker AWS CDK Private Workforce Amazon Cognito Infrastructure as Code Data Labeling AWS CloudFormation MLOps Automation Security
相关文章