ACE框架：智能代理上下文工程新范式

A new framework from Stanford University and SambaNova addresses a critical challenge in building robust AI agents: context engineering. Called Agentic Context Engineering (ACE), the framework automatically populates and modifies the context window of large language model (LLM) applications by treating it as an “evolving playbook” that creates and refines strategies as the agent gains experience in its environment.

ACE is designed to overcome key limitations of other context-engineering frameworks, preventing the model’s context from degrading as it accumulates more information. Experiments show that ACE works for both optimizing system prompts and managing an agent's memory, outperforming other methods while also being significantly more efficient.

The challenge of context engineering

Advanced AI applications that use LLMs largely rely on "context adaptation," or context engineering, to guide their behavior. Instead of the costly process of retraining or fine-tuning the model, developers use the LLM’s in-context learning abilities to guide its behavior by modifying the input prompts with specific instructions, reasoning steps, or domain-specific knowledge. This additional information is usually obtained as the agent interacts with its environment and gathers new data and experience. The key goal of context engineering is to organize this new information in a way that improves the model’s performance and avoids confusing it. This approach is becoming a central paradigm for building capable, scalable, and self-improving AI systems.

Context engineering has several advantages for enterprise applications. Contexts are interpretable for both users and developers, can be updated with new knowledge at runtime, and can be shared across different models. Context engineering also benefits from ongoing hardware and software advances, such as the growing context windows of LLMs and efficient inference techniques like prompt and context caching.

There are various automated context-engineering techniques, but most of them face two key limitations. The first is a “brevity bias,” where prompt optimization methods tend to favor concise, generic instructions over comprehensive, detailed ones. This can undermine performance in complex domains.

The second, more severe issue is "context collapse." When an LLM is tasked with repeatedly rewriting its entire accumulated context, it can suffer from a kind of digital amnesia.

“What we call ‘context collapse’ happens when an AI tries to rewrite or compress everything it has learned into a single new version of its prompt or memory,” the researchers said in written comments to VentureBeat. “Over time, that rewriting process erases important details—like overwriting a document so many times that key notes disappear. In customer-facing systems, this could mean a support agent suddenly losing awareness of past interactions... causing erratic or inconsistent behavior.”

The researchers argue that “contexts should function not as concise summaries, but as comprehensive, evolving playbooks—detailed, inclusive, and rich with domain insights.” This approach leans into the strength of modern LLMs, which can effectively distill relevance from long and detailed contexts.

How Agentic Context Engineering (ACE) works

ACE is a framework for comprehensive context adaptation designed for both offline tasks, like system prompt optimization, and online scenarios, such as real-time memory updates for agents. Rather than compressing information, ACE treats the context like a dynamic playbook that gathers and organizes strategies over time.

The framework divides the labor across three specialized roles: a Generator, a Reflector, and a Curator. This modular design is inspired by “how humans learn—experimenting, reflecting, and consolidating—while avoiding the bottleneck of overloading a single model with all responsibilities,” according to the paper.

The workflow starts with the Generator, which produces reasoning paths for input prompts, highlighting both effective strategies and common mistakes. The Reflector then analyzes these paths to extract key lessons. Finally, the Curator synthesizes these lessons into compact updates and merges them into the existing playbook.

To prevent context collapse and brevity bias, ACE incorporates two key design principles. First, it uses incremental updates. The context is represented as a collection of structured, itemized bullets instead of a single block of text. This allows ACE to make granular changes and retrieve the most relevant information without rewriting the entire context.

Second, ACE uses a “grow-and-refine” mechanism. As new experiences are gathered, new bullets are appended to the playbook and existing ones are updated. A de-duplication step regularly removes redundant entries, ensuring the context remains comprehensive yet relevant and compact over time.

ACE in action

The researchers evaluated ACE on two types of tasks that benefit from evolving context: agent benchmarks requiring multi-turn reasoning and tool use, and domain-specific financial analysis benchmarks demanding specialized knowledge. For high-stakes industries like finance, the benefits extend beyond pure performance. As the researchers said, the framework is “far more transparent: a compliance officer can literally read what the AI learned, since it’s stored in human-readable text rather than hidden in billions of parameters.”

The results showed that ACE consistently outperformed strong baselines such as GEPA and classic in-context learning, achieving average performance gains of 10.6% on agent tasks and 8.6% on domain-specific benchmarks in both offline and online settings.

Critically, ACE can build effective contexts by analyzing the feedback from its actions and environment instead of requiring manually labeled data. The researchers note that this ability is a "key ingredient for self-improving LLMs and agents." On the public AppWorld benchmark, designed to evaluate agentic systems, an agent using ACE with a smaller open-source model (DeepSeek-V3.1) matched the performance of the top-ranked, GPT-4.1-powered agent on average and surpassed it on the more difficult test set.

The takeaway for businesses is significant. “This means companies don’t have to depend on massive proprietary models to stay competitive,” the research team said. “They can deploy local models, protect sensitive data, and still get top-tier results by continuously refining context instead of retraining weights.”

Beyond accuracy, ACE proved to be highly efficient. It adapts to new tasks with an average 86.9% lower latency than existing methods and requires fewer steps and tokens. The researchers point out that this efficiency demonstrates that “scalable self-improvement can be achieved with both higher accuracy and lower overhead.”

For enterprises concerned about inference costs, the researchers point out that the longer contexts produced by ACE do not translate to proportionally higher costs. Modern serving infrastructures are increasingly optimized for long-context workloads with techniques like KV cache reuse, compression, and offloading, which amortize the cost of handling extensive context.

Ultimately, ACE points toward a future where AI systems are dynamic and continuously improving. "Today, only AI engineers can update models, but context engineering opens the door for domain experts—lawyers, analysts, doctors—to directly shape what the AI knows by editing its contextual playbook," the researchers said. This also makes governance more practical. "Selective unlearning becomes much more tractable: if a piece of information is outdated or legally sensitive, it can simply be removed or replaced in the context, without retraining the model.”

The challenge of context engineering

How Agentic Context Engineering (ACE) works

ACE in action

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签