Nilenso Blog 10月15日 00:30
AI产品设计应遵循“苦涩教训”原则
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了Richard Sutton提出的“苦涩教训”(The Bitter Lesson),并将其应用于AI产品开发。文章指出,AI研究的经验表明,依赖计算能力的通用方法最终最有效。开发者在构建AI应用时,应避免过度依赖预设的“人工知识”和复杂的、基于规则的工作流,因为这会限制模型的能力。相反,更有效的做法是构建一个能提供反馈循环的环境,让AI模型自行学习和优化。文章通过对比“手工式”和“苦涩教训式”的架构,强调了拥抱模型内在的学习能力和目标导向特性的重要性,并指出在某些特定场景下,人工知识驱动的架构仍有其价值,但应易于迭代和移除。

💡 **通用计算方法是AI发展的关键**:文章强调,70年来AI研究的最大启示是,那些能充分利用计算能力的通用方法,长期来看是最有效的。这意味着在AI产品设计中,应优先考虑能够扩展和适应的通用解决方案,而不是针对特定任务的、高度定制化的方法。

⚙️ **避免“人工知识”的过度干预**:作者指出,许多AI开发者倾向于在应用中嵌入大量“人工知识”,如详细的规则、角色设定和复杂的指令集。这种做法阻碍了模型自身的学习和推理能力,是“苦涩教训”的反面。成功的AI产品应该让模型在环境中自行学习,而不是被人类预设的逻辑所束缚。

🔄 **拥抱反馈循环与目标导向**:文章提倡构建支持“行动-反馈”循环的AI系统。通过为模型设定明确的目标并提供环境反馈,AI能够更好地利用其通过强化学习获得的“目标导向”能力。这种架构比预先填充大量信息或定义固定工作流更为简洁有效。

🛠️ **审慎使用“手工式”架构**:虽然“苦涩教训”原则是核心,但文章也承认,在模型尚未完全成熟或任务不适合复杂推理时,“手工式”(artisanal)架构仍有其用武之地。然而,设计这类架构时,必须认识到其局限性,并做好随时迭代或替换的准备,以适应未来模型的发展。

tldr? Your AI product must “price in” the knowledge of Sutton’s Bitter Lesson.


Everyone is talking about Richard Sutton’s Bitter Lesson once again1.

The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.

Rich Sutton, The Bitter Lesson

I highly recommend taking a detour to read Richard’s essay if you haven’t yet, before coming back to this page. It’s very short.

Here’s my observation: The bitter lesson applies to developers building and working with AI applications as well—and many have not yet digested the bitter lesson.

How not to code with AI

I’ve observed a type of AI-maximalist programmer often found at vibe coding events, workshops and demos. Their setup often has a folder full of text files that describe “rules”, “modes”, “roles”, prompts, or subagents. It often looks like a dump of all possible individual actions a developer can take—PRD analyser, planner, user story writer, code reviewer, UAT tester, etc. These files are full of long instructions, with lots of “pleading” language, capitalisation and even step-by-step logic telling an LLM how it should think and act.

The fundamental error in the above methods is that they bake in assumptions of what a workflow should look like, and how the agent should operate. They meddle with the model’s behaviour. It is what Sutton would describe as a “human knowledge based” method.

Some of these tricks were necessary when the models were weaker and less agentic. Today, they can reason well and learn from the feedback in the environment. Force-fitting a complex web of workflows and roles is potentially fighting against the model weights.

The engineer that has digested the bitter lesson will instead set up an environment that can provide feedback loops to the agent. This setup is simpler and better accommodates frontier reasoning models that are scaled with reinforcement learning by getting out of their way.

How not to build LLM wrappers

I have observed engineers directly jump to complex workflows, indiscriminate application of prompting tricks and multiple agents with fixed roles when designing an LLM-integrated application. These add unnecessary complexity and should not be the default starting point. To better illustrate why, we can look at how coding agents have evolved over time.

The first generation of AI coding tools (Cursor, Sourcegraph Cody, Codeium2, Copilot) heavily relied on chunk-and-embed paradigm, ie, use a separate vector embeddings storage layer that prefills retrieved chunks into the LLM’s context window. 3

Newer AI tools (Cline, Windsurf, Amp, Claude Code, Codex, OpenHands) eschew pre-filled retrievals in favour of agentic search—ie, tell the AI how to invoke a search, and let it figure it out from there. How the search is performed is an implementation detail. This is a much simpler fundamental architecture. 4

The latter approach better embodies the bitter-lesson. Do not bake in your human knowledge assumptions by prefilling items into the agent’s context window.

Reinforcement learning produces goal-seeking agents. Anyone who has digested the bitter lesson knows that more compute is being poured into these LLMs to make goal-seekers (they get a reward signal when they achieve their goal). Leverage this fact. As models get better at goal-seeking in general, they will get better inside applications that mirror this action → feedback loop.

We can generalise this for most LLM-enabled applications.

Let’s contrast some human-knowledge driven “artisanal” architectures against more “bitter-lessoned” architectures which could represent two ends of a spectrum.

Artisanal architecture Bitter-lessoned architectures
Prescriptive workflows Take actions, respond to feedback in a loop
Prefilling tokens into prompts Giving models an objective and some tools
Stages and modes Modeless
embed-and-chunk Agentic search
Makes assumptions about how a model should operate and think Sets up an environment and context that a model can verify itself against
Imperative Declarative
Specialised tool interfaces Code execution

Signals affirming the bitter lesson influencing application design

When to use artisanal architectures

This is not to say that artisanal architectures are bad. It’s that artisanal architectures must account for the bitter lesson.

When the model isn’t good at your task yet, but may get there eventually under the current scaling regime—design an artisanal architecture to build what is needed today, but do so with the knowledge that some day you may have to throw this functionality away—make the artisanal parts especially easy to remove when the bitter lesson inevitably strikes. 5

A more permanently artisanal architecture also makes sense when your task does not require a repeated sequence of actions and deep thinking, for example, a classification task in a pipeline or a task to link similar address records.

Make a note of what is not scaling with compute

With current scaling methods verifiable tasks with clear goals will continue to improve: coding, searching, mathematics. Leave the methods of achieving the goal to the agent.

Current training methods have also not scaled context window sizes as reliably—so you might want to hold on to subagents and context-compaction tricks.

Training methods will also not solve for important parts that are gluing things together, like retries and reliable execution, or good interface design.

Summary

Similar articles


Thanks to Srihari and Ravi Chandra Padmala for reviewing drafts of this.


  1. Mostly thanks to the recent Dwarkesh Podcast. I am aware that even though this whole article is going to be LLM-centric, Sutton himself does not believe LLMs are the most “bitter lesson-pilled” AI architecture. But I believe it’s fair to say that there’s a spectrum to the bitter lesson and LLMs are definitely less human-knowledge based than other generalist AI architectures. 

  2. Before Codeium became the more agentic windsurf. 

  3. There’s also Aider—while it is not using embeddings, inserts a repomap into the LLM context, which makes it a form of prefilling. 

  4. The devil of course is in the details and the elbow grease. And also the parts which actually do not improve even when AI scaling continues. While the fundamental architecture is simple, it’s not necessarily easy to nail down all the details. But artisanal architectures are neither simple nor easy. 

  5. In some instances, you might see a hop from one type of artisanal workflow to another, due to a model improving, but still not improving enough to remove the need for a human knowledge informed method, as was the case with Devin and Sonnet 4.5

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AI产品设计 苦涩教训 Sutton's Bitter Lesson 计算能力 通用方法 模型学习 反馈循环 强化学习 AI架构 人工知识
相关文章