Communications of the ACM - Artificial Intelligence 09月30日
AI软件开发新思维
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了从传统软件开发转向AI驱动功能构建时的关键思维转变。AI软件开发不再依赖明确的规则,而是从数据中学习,因此数据质量和治理变得至关重要。AI的非确定性(概率性)要求新的设计考虑,如置信度评分和用户反馈循环。测试转变为评估,侧重于衡量AI系统的表现而非绝对正确性。文章强调了AI幻觉的挑战,以及通过检索、透明度提示和持续监控来管理幻觉的重要性。此外,还讨论了“人机协同”模式、AI的自主性与用户控制之间的权衡,以及AI系统需要持续学习和承担责任的现实。最终,AI构建者需要拥抱一种新的思维模式,将技术技能与负责任的数据使用、持续监控和道德考量相结合,以构建下一代有影响力的产品。

💡 **规则驱动到数据驱动的转变**:传统软件依赖明确定义的规则,而AI软件则从数据中学习行为。这意味着构建者需要像关注代码一样关注数据质量、覆盖范围和治理,因为数据是AI能力的基础。

⚖️ **确定性到概率性的思维调整**:传统软件通常是确定性的,而AI系统本质上是概率性的,这意味着相同的输入可能产生不同的输出。这种不确定性要求在设计中引入置信度评分和用户反馈机制,以应对AI的不确定性。

📊 **测试到评估的范式革新**:AI模型无法通过传统的单元测试来证明其绝对正确性。取而代之的是,构建者依赖“评估”(evals)来衡量AI系统的表现,识别不足之处并进行改进。这些评估需要处理模糊性,并寻找与实际成功相关的可靠信号。

🧠 **从“一次性交付”到“持续学习”的演进**:AI系统不像传统软件那样“发布即遗忘”。随着用户行为、上下文或数据的变化,AI模型可能会发生漂移,因此需要持续的监控和再训练,以保持其有效性和准确性。

🛡️ **拥抱责任与道德考量**:AI的引入带来了偏见、公平性、版权和安全等方面的伦理和合规责任。这些不再是边缘问题,而是需要从产品设计之初就核心考虑的因素,以确保AI系统的负责任使用。

Artificial intelligence is quickly becoming part of everyday products. But for those who have spent their careers building traditional software, the shift to building AI-powered features can be uncomfortable. The rules of the game are different. Here are some key things to keep in mind when making the transition.

    From Rules to Data
    In traditional software, engineers explicitly define the rules—if X, then do Y. With AI, behavior is learned from data rather than hard-coded. This means builders must think about data quality, coverage, and governance as much as they think about code.From Deterministic to Probabilistic
    Software is usually deterministic, however AI is inherently non-deterministic (or probabilistic). What I mean by this is that the same query may yield different responses. This uncertainty requires new design considerations such as confidence scoring and user feedback loops.From Testing to Evaluation
    You can unit test a software function, but you can’t write a unit test that proves an AI model is always right. Instead, builders rely on evaluation metrics, commonly called “evals.” Evals are scoring mechanisms that help you assess whether your AI system is working, where it’s falling short, and what needs improvement. Think of them as the equivalent of tests in traditional software. But unlike typical unit or integration tests, where inputs and outputs are fixed and correctness is binary, AI evals deal with ambiguity.The Challenge of Measuring the Unmeasurable
    Traditional software testing relies on deterministic outcomes. Either the payment processing function correctly calculates the total or it doesn’t. On the other hand, AI evals venture into inherently subjective territory. How do you score the quality of a generated email? The helpfulness of a chatbot response? This ambiguity is both the biggest challenge, and the most important skill to develop, for an AI builder.
    The insight here is that perfect evaluation may be impossible, but useful evaluation is essential. Builders should not try to capture every nuance of quality. Instead, the goal should be to look for reliable signals that correlate with real-world success. Sometimes this means accepting proxies—measuring response length as a crude indicator of thoroughness, or checking for specific keywords as a signal of topic relevance.The Human-in-the-Loop Reality
    One of the most practical challenges AI builders face is the cost and complexity of human evaluation. While automated metrics are fast and scalable, they often miss nuances that only humans can catch. Successful teams develop hybrid approaches, such as automated evals for rapid iteration, human evaluation for high-stakes decisions and edge cases, and semi-automated systems where humans oversee AI-generated scores.Agency and Control for AI features
    Agency, in the context of building AI features, is the system’s ability to take actions, make decisions, or carry out tasks on behalf of the user (which relates to ‘AI agents’). Think of agents paying your bill, writing code, or handling customer support. Unlike traditional tools, AI systems are built to act with varying levels of autonomy. But here’s the part that AI builders often overlook: Every time an AI agent is given more agency, the human user loses some control. So there’s always an agency-control tradeoff to consider. That tradeoff could have significant consequences. On the one hand, if your AI agent suggests a response, the human in the loop can override it. On the other, if it sends the response autonomously, it better be correct. The mistake most inexperienced AI builders make is jumping to full agency before they’ve sufficiently tested what happens when the system gets it wrong. If you haven’t tested how the system behaves under high control, you’re not ready to give it high agency. And if you hand over too much agency without the system earning it first, you may lose visibility into the system, and the trust of your customers.The Reality of AI Hallucinations
    One of the most distinct challenges with AI systems, particularly ones using large language models, is their tendency to hallucinate. A hallucination occurs when an AI generates an output that sounds confident but is factually incorrect or fabricated.

    Why does Hallucination Happen?
    Hallucinations aren’t bugs in the traditional sense. They emerge from how models are trained, predicting the next most likely word based on patterns in data. When the model lacks sufficient grounding or context, the model may simply “fill in the blanks” with plausible but false information.

    The Risks
    For end users, hallucinations can range from quirky to factually incorrect or costly errors. In domains like healthcare or finance, the consequences could be dire. This makes hallucination management a core design responsibility.

    Strategies for Builders

8. From Ship-and-Forget to Continuous Learning
Software releases are often final until the next version. AI systems, by
contrast, require ongoing monitoring and retraining. Models can drift as
user behavior, context, or data changes.

9. From Predictable to Trustworthy
In traditional design, users expect precise, predictable outcomes. With AI,
the experience is about trust: communicating uncertainty, offering
transparency, and giving users control. Builders need to create guardrails
and recovery paths for when AI gets it wrong.

10. From Functionality to Responsibility
AI introduces ethical and compliance responsibilities related to bias,
fairness, copyright, and security. These aren’t edge considerations.
Builders should make sure that they’re core to product success and must
be designed into the system from day one.

The Builder’s Mindset Shift

For builders, the shift is less about learning new tools and more about embracing a new mindset. Traditional software is engineered. AI systems need to be engineered, trained but also governed. Success depends not only on technical skill, but also on thoughtful design, monitoring, and responsible use of data.

As AI becomes a necessity for all of us, builders who adapt their approach will be best positioned to deliver the next generation of impactful products.

Shilpa Shastri is a Principal Product Manager at Apptio (an IBM company), where she owns data strategy and GenAI features. Her work bridges product strategy, cloud economics, and AI innovation—helping enterprises adopt AI responsibly and at scale.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI 人工智能 软件开发 AI构建 思维转变 数据驱动 AI评估 AI幻觉 负责任AI Artificial Intelligence Software Development AI Building Mindset Shift Data-Driven AI Evaluation AI Hallucinations Responsible AI
相关文章