AWS Machine Learning Blog 10月24日 05:04
将负责任的AI纳入生成式AI项目优先级排序
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了一种将负责任的AI实践纳入生成式AI项目优先级排序的方法。文章指出,在评估大量潜在生成式AI项目时,企业需要权衡业务价值与成本、工作量以及新兴风险,如幻觉、代理决策失误和快速变化的监管环境。通过在项目初期进行负责任的AI风险评估,可以更准确地了解项目风险和缓解措施的工作量,避免后期昂贵的返工,并维护客户信任。文章以AWS Well-Architected Framework为例,阐述了负责任AI的八个维度,并演示了如何将这些考虑因素融入加权最短作业优先(WSJF)等优先级排序方法中,通过对比自动生成产品描述和生成品牌视觉资产两个项目,说明了负责任AI评估如何影响项目优先级。

💡 **负责任AI的重要性日益凸显**:随着生成式AI应用的激增,企业在项目优先级排序时,除了传统的业务价值和成本考量外,还需面对诸如AI幻觉、代理决策错误以及快速演变的监管环境等新风险。将负责任AI原则融入项目初期评估,是最大化AI效益、最小化风险的关键。AWS Well-Architected Framework提出了公平性、可解释性、隐私与安全、安全性、可控性、真实性和鲁棒性、治理以及透明度这八个维度,为评估AI风险提供了框架。

🛠️ **将负责任AI纳入WSJF优先级排序方法**:文章建议使用加权最短作业优先(WSJF)方法,其公式为“优先级 = 延迟成本 / 工作量”。其中,“延迟成本”衡量业务价值(直接价值、及时性、相邻机会),而“工作量”则包含项目交付的整体所需精力,尤其要纳入负责任AI风险评估的结果和预期的缓解措施成本。例如,识别出的风险越多,所需进行的缓解措施就越复杂,从而增加了“工作量”的评估。

📊 **风险评估对项目优先级的实际影响**:通过对比“自动生成产品描述”和“创建视觉品牌资产”两个项目,文章展示了负责任AI风险评估如何改变初步的优先级排序。尽管在初步评估中,视觉资产项目因其高价值和及时性而得分更高,但在纳入风险评估后,其“工作量”因涉及更复杂的图像安全防护、人工审核以及潜在的研究投入而显著增加,使得产品描述项目在考虑了AI风险后,显得更具优先级。这表明,某些AI应用(如图像生成)可能比文本生成面临更严峻的负责任AI挑战。

Over the past two years, companies have seen an increasing need to develop a project prioritization methodology for generative AI. There is no shortage of generative AI use cases to consider. Rather, companies want to evaluate the business value against the cost, level of effort, and other concerns, for a large number of potential generative AI projects. One new concern for generative AI compared to other domains is considering issues like hallucination, generative AI agents making incorrect decisions and then acting on those decisions through tool calls to downstream systems, and dealing with the rapidly changing regulatory landscape. In this post we describe how to incorporate responsible AI practices into a prioritization method to systematically address these types of concerns.

Responsible AI overview

The AWS Well-Architected Framework defines responsible AI as “the practice of designing, developing, and using AI technology with the goal of maximizing benefits and minimizing risks.” The AWS responsible AI framework begins by defining eight dimensions of responsible AI: fairness, explainability, privacy and security, safety, controllability, veracity and robustness, governance, and transparency. At key points in the development lifecycle, a generative AI team should consider the possible harms or risks for each dimension (inherent and residual risks), implements risk mitigations, and monitors risk on an ongoing basis. Responsible AI applies across the entire development lifecycle and should be considered during initial project prioritization. That’s especially true for generative AI projects, where there are novel types of risks to consider, and mitigations might not be as well understood or researched. Considering responsible AI up front gives a more accurate picture of project risk and mitigation level of effort and reduces the chance of costly rework if risks are uncovered later in the development lifecycle. In addition to potentially delayed projects due to rework, unmitigated concerns might also harm customer trust, result in representational harm, or fail to meet regulatory requirements.

Generative AI prioritization

While most companies have their own prioritization methods, here we’ll demonstrate how to use the weighted shortest job first (WSJF) method from the Scaled Agile system. WSJF assigns a priority using this formula:

Priority = (cost of delay) / (job size)

The cost of delay is a measure of business value. It includes the direct value (for example, additional revenue or cost savings), the timeliness (such as, is shipping this project worth a lot more today than a year from now), and the adjacent opportunities (such as, would delivering this project open up other opportunities down the road).

The job size is where you consider the level of effort to deliver the project. That normally includes direct development costs and paying for any infrastructure or software you need. The job size is where you can include the results of the initial responsible AI risk assessment and expected mitigations. For example, if the initial assessment uncovers three risks that require mitigation, you include the development cost for those mitigations in the job size. You can also qualitatively assess that a project with ten high-priority risks is more complex than a project with only two high-priority risks.

Example scenario

Now, let’s walk through a prioritization exercise that compares two generative AI projects. The first project uses a large language model (LLM) to generate product descriptions. A marketing team will use this application to automatically create production descriptions that go into the online product catalog website. The second project uses a text-to-image model to generate new visuals for advertising campaigns and the product catalog. The marketing team will use this application to more quickly create customized brand assets.

First pass prioritization

First, we’ll go through the prioritization method without considering responsible AI, assigning a score of 1–5 for each part of the WSJF formula. The specific scores vary by organization. Some companies prefer to use t-shirt sizing (S, M, L, and XL), others prefer a score of 1–5, and others will use a more granular score. A score of 1–5 is a common and straightforward way to start. For example, the direct value scores can be calculated as:

1 = no direct value

2 = 20% improvement in KPI (time to create high-quality descriptions)

3 = 40% improvement in KPI

4 = 80% improvement in KPI

5 = 100% or more improvement in KPI

Project 1: Automated product descriptions (scored from 1–5)Project 2: Creating visual brand assets (scored from 1–5)Direct value 3: Helps marketing team create higher quality descriptions more quickly 3: Helps marketing team create higher quality assets more quicklyTimeliness 2: Not particularly urgent 4: New ad campaign planned this quarter; without this project, cannot create enough brand assets without hiring a new agency to supplement the teamAdjacent opportunities 2: Might be able to reuse for similar scenarios) 3: Experience gained in image generation will build competence for future projectsJob size 2: Basic, well-known pattern 2: Basic, well-known patternScore (3+2+2)/2 = 3.5 (3+4+3)/2 = 5

At first glance, it looks like Project 2 is more compelling. Intuitively that makes sense—it takes people a lot longer to make high-quality visuals than to create textual product descriptions.

Risk assessment

Now let’s go through a risk assessment for each project. The following table lists a brief overview of the outcome of a risk assessment along each of the AWS responsible AI dimensions, along with a t-shirt size (S, M, L, and XL) severity level. The table also includes suggested mitigations.

Project 1: Automated product descriptionsProject 2: Creating visual brand assetsFairness L: Are descriptions appropriate in terms of gender and demographics? Mitigate using guardrails. L: Images must not portray particular demographics in a biased way. Mitigate using human and automated checks.Explainability No risks identified. No risks identified.Privacy and security L: Some product information is proprietary and cannot be listed on a public site. Mitigate using data governance. L: Model must not be trained on any images that contain proprietary information. Mitigate using data governance.Safety M: Language must be age-appropriate and not cover offensive topics. Mitigate using guardrails. L: Images must not contain adult content or images of drugs, alcohol, or weapons. Mitigate using guardrails.Controllability S: Need to track customer feedback on the descriptions. Mitigate using customer feedback collection. L: Do images align to our brand guidelines? Mitigate using human and automated checks.Veracity and robustness M: Will the system hallucinate and imply product capabilities that aren’t real? Mitigate using guardrails. L: Are images realistic enough to avoid uncanny valley effects? Mitigate using human and automated checks.Governance M: Prefer LLM providers that offer copyright indemnification. Mitigate using LLM provider selection. L: Require copyright indemnification and image source attribution. Mitigate using model provider selection.Transparency S: Disclose that descriptions are AI generated. S: Disclose that descriptions are AI generated.

The risks and mitigations are use-case specific. The preceding table is for illustrative purposes only.

Second pass prioritization

How does the risk assessment affect the prioritization?

Project 1: Automated product descriptions (scored from 1–5)Project 2: Creating visual brand assets (scored from 1–5)Job size 3: Basic, well-known pattern; requires fairly standard guardrails, governance, and feedback collection. 5: Basic, well-known pattern. Requires advanced image guardrails with human oversight, and a more expensive commercial model. Research spike needed.Score (3+2+2)/3 = 2.3 (3+4+3)/5 = 2

Now it looks like Project 1 is a better one to start with. Intuitively, after you consider responsible AI, that makes sense. Poorly crafted or offensive images are more noticeable and have a larger impact than a poorly phrased product description. And the guardrails you can use for maintaining image safety are less mature than the equivalent guardrails for text, particularly in ambiguous cases like adhering to brand guidelines. In fact, an image guardrail system might require training a monitoring model or using people to spot-check some percentage of the output. You might need to dedicate a small science team to study this problem first.

Conclusion

In this post, you saw how to include responsible AI considerations in a generative AI project prioritization method. You saw how conducting a responsible AI risk assessment in the initial prioritization phase can change the outcome by uncovering a substantial amount of mitigation work. Moving forward, you should develop your own responsible AI policy and start adopting responsible AI practices for generative AI projects. You can find additional details and resources at Transform responsible AI from theory into practice.


About the author

Randy DeFauw is a Sr. Principal Solutions Architect at AWS. He has over 20 years of experience in technology, starting with his university work on autonomous vehicles. He has worked with and for customers ranging from startups to Fortune 50 companies, launching Big Data and Machine Learning applications. He holds an MSEE and an MBA, serves as a board advisor to K-12 STEM education initiatives, and has spoken at leading conferences including Strata and GlueCon. He is the co-author of the books SageMaker Best Practices and Generative AI Cloud Solutions. Randy currently acts as a technical advisor to AWS’ director of technology in North America.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Generative AI Responsible AI Project Prioritization WSJF Risk Assessment AWS 生成式AI 负责任AI 项目优先级 风险评估
相关文章