MarkTechPost@AI 10月10日 19:39
ACE框架:通过上下文工程提升大型语言模型性能
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

斯坦福大学、SambaNova Systems和加州大学伯克利分校的研究人员推出ACE框架,通过编辑和扩展输入上下文而非更新模型权重来提升大型语言模型(LLM)的性能。该框架将上下文视为一个由生成器、反思器和策展人三个角色维护的动态“剧本”,并以增量方式合并小型增量项,以避免简洁性偏差和上下文崩溃。实验结果显示,在AppWorld代理任务上性能提升10.6%,在金融推理任务上提升8.6%,同时与强上下文适应基线相比,延迟降低了约86.9%。ACE将“上下文工程”定位为一种与参数更新同等重要的方法,通过积累和组织领域特定策略来提高代理任务的上下文密度,尤其是在涉及工具、多轮状态和失败模式的情况下。

💡 ACE框架的核心创新在于将“上下文工程”作为一种独立于模型权重更新的性能提升途径。它不依赖于微调或参数调整,而是通过动态地编辑和扩展模型的输入上下文来优化其表现。这种方法将上下文视为一个不断演进的“剧本”,其中包含任务相关的策略和知识,从而使模型能够更好地处理复杂场景。

🔄 ACE框架通过生成器(Generator)、反思器(Reflector)和策展人(Curator)三个角色协同工作来管理上下文。生成器负责执行任务并产生行动轨迹,反思器从中提炼出具体的经验教训,而策展人则将这些经验转化为结构化的“增量项”(delta items),并以确定性的方式合并到上下文“剧本”中,同时进行去重和修剪,以保持上下文的精炼和有效性。

🚀 ACE框架通过增量式更新和“生长-精炼”的设计,有效保留了重要的历史信息,并避免了因整体重写而导致的“上下文崩溃”。这种方式使得上下文能够随着时间的推移而积累和优化,特别是在处理工具使用、多轮对话状态以及应对潜在错误模式等代理任务时,能够显著提升模型的性能和鲁棒性。

📊 在基准测试中,ACE框架展现了显著的优势。在AppWorld代理任务上,ReAct+ACE相比强基线平均提升了10.6%,并且在2025年9月20日的排行榜上,其表现与使用GPT-4.1的IBM CUGA接近,但使用的是一个较小的开源模型。在金融领域(FiNER和XBRL Formula)的推理任务上,ACE也取得了平均8.6%的提升。

💰 ACE框架在成本效益方面也表现出色。通过非LLM的合并和局部更新,它大幅降低了适应成本:在离线场景下,延迟降低了82.3%,回滚次数减少了75.1%;在在线场景下,延迟降低了91.5%,令牌成本减少了83.6%。这表明ACE能够以更低的计算开销实现有效的模型适应。

TL;DR: A team of researchers from Stanford University, SambaNova Systems and UC Berkeley introduce ACE framework that improves LLM performance by editing and growing the input context instead of updating model weights. Context is treated as a living “playbook” maintained by three roles—Generator, Reflector, Curator—with small delta items merged incrementally to avoid brevity bias and context collapse. Reported gains: +10.6% on AppWorld agent tasks, +8.6% on finance reasoning, and ~86.9% average latency reduction vs strong context-adaptation baselines. On the AppWorld leaderboard snapshot (Sept 20, 2025), ReAct+ACE (59.4%) ≈ IBM CUGA (60.3%, GPT-4.1) while using DeepSeek-V3.1.

What ACE changes?

ACE positions “context engineering” as a first-class alternative to parameter updates. Instead of compressing instructions into short prompts, ACE accumulates and organizes domain-specific tactics over time, arguing that higher context density improves agentic tasks where tools, multi-turn state, and failure modes matter.

Method: Generator → Reflector → Curator

Two design choices—incremental delta updates and grow-and-refine—preserve useful history and prevent “context collapse” from monolithic rewrites. To isolate context effects, the research team fixes the same base LLM (non-thinking DeepSeek-V3.1) across all three roles.

Benchmarks

AppWorld (agents): Built on the official ReAct baseline, ReAct+ACE outperforms strong baselines (ICL, GEPA, Dynamic Cheatsheet), with +10.6% average over selected baselines and ~+7.6% over Dynamic Cheatsheet in online adaptation. On the Sept 20, 2025 leaderboard, ReAct+ACE 59.4% vs IBM CUGA 60.3% (GPT-4.1); ACE surpasses CUGA on the harder test-challenge split, while using a smaller open-source base model.

Finance (XBRL): On FiNER token tagging and XBRL Formula numerical reasoning, ACE reports +8.6% average over baselines with ground-truth labels for offline adaptation; it also works with execution-only feedback, though quality of signals matters.

Cost and latency

ACE’s non-LLM merges plus localized updates reduce adaptation overhead substantially:

Key Takeaways

Conclusion

ACE positions context engineering as a first-class alternative to weight updates: maintain a persistent, curated playbook that accumulates task-specific tactics, yielding measurable gains on AppWorld and finance reasoning while cutting adaptation latency and token rollouts versus reflective-rewrite baselines. The approach is practical—deterministic merges, delta items, and long-context–aware serving—and its limits are clear: outcomes track feedback quality and task complexity. If adopted, agent stacks may “self-tune” primarily through evolving context rather than new checkpoints.


Check out the PAPER here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Agentic Context Engineering (ACE): Self-Improving LLMs via Evolving Contexts, Not Fine-Tuning appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ACE Agentic Context Engineering LLM Context Engineering Self-Improving LLMs Context Adaptation Stanford SambaNova Systems UC Berkeley Machine Learning AI Natural Language Processing
相关文章