VentureBeat 10月04日
高质量示范数据是训练AI智能代理的关键
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

一项新研究表明,训练大型语言模型(LLM)执行复杂自主任务并不需要海量数据集。上海交大和SII生成式AI研究实验室提出的LIMI框架(Less Is More for Intelligent Agency)发现,机器的自主性并非源于数据量的丰富,而是来自对高质量示范数据的策略性策选。研究人员通过仅用78个精心挑选的例子,就训练出了在关键行业基准测试中表现远超使用数千个例子训练的模型的LLM。这一发现对于数据稀缺或昂贵的企业应用具有重要意义,它强调了数据质量而非数量对于培养AI智能代理的重要性。

💡 **数据质量而非数量驱动AI自主性**: 研究指出,AI系统的自主性,即其发现问题、提出假设并执行解决方案的能力,并非依赖于海量数据。LIMI框架的核心发现是,通过策略性地选择和策展高质量的示范数据,可以有效训练出具备高度智能代理能力的LLM,颠覆了传统认为数据量越大越好的观念。

🚀 **LIMI框架的创新方法**: LIMI框架通过一种创新的方法来构建数据集,其中每个示范都包含一个自然语言查询和一个详细的执行轨迹。轨迹记录了AI完成任务的完整步骤,包括内部推理、工具调用和环境反馈。这种方法确保模型不仅学习成功结果,还学习解决问题的整个过程,包括如何适应策略和从失败中恢复。

📊 **实验验证与显著成效**: 在AgencyBench等基准测试中,使用LIMI框架仅用78个示范数据训练的模型,其表现显著优于使用数千甚至上万个数据训练的基准模型,性能提升了128倍。这证明了LIMI框架在数据效率和模型性能上的巨大潜力,尤其适用于数据获取成本高昂的场景。

🏢 **企业应用的巨大潜力**: LIMI框架为企业开发高度专业化的AI代理提供了一条切实可行的途径。企业可以利用内部专业知识创建小型、高质量的数据集,用于特定的代理任务,从而降低AI开发的门槛,并为关键业务流程构建定制化的AI解决方案,获得竞争优势。

A new study by Shanghai Jiao Tong University and SII Generative AI Research Lab (GAIR) shows that training large language models (LLMs) for complex, autonomous tasks does not require massive datasets.

Their framework, LIMI (Less Is More for Intelligent Agency), builds on similar work in other areas of LLM research and finds that “machine autonomy emerges not from data abundance but from strategic curation of high-quality agentic demonstrations.” 

In other words, it's data quality, not quantity, that matters.

In experiments, the researchers found that with a small, but carefully curated, dataset of just 78 examples, they could train LLMs to outperform models trained on thousands of examples by a considerable margin on key industry benchmarks.

This discovery could have important implications for enterprise applications where data is scarce or expensive to collect.

The challenge of building agents that work

The researchers define agency as “the emergent capacity of AI systems to function as autonomous agents–actively discovering problems, formulating hypotheses, and executing solutions through self-directed engagement with environments and tools.” In other words, these are AI systems that “don’t just think, but work.” 

The problem is that current training frameworks assume that higher agentic intelligence requires a lot of data, as has been shown in the classic scaling laws of language modeling. The researchers argue that this approach leads to increasingly complex training pipelines and substantial resource requirements. Moreover, in many areas, data is not abundant, hard to obtain, and very expensive to curate.

However, research in other domains suggests that you don’t necessarily require more data to achieve training objectives in LLM training.

For example, LIMA, a 2023 paper, showed a model could be effectively aligned with just 1,000 curated examples. More recently, LIMO demonstrated that complex mathematical reasoning could emerge from only 817 training samples.

With LIMI, the researchers sought to apply the same “less is more” principle to the complex world of AI agents.

How LIMI works

The LIMI framework demonstrates that sophisticated agentic intelligence can emerge from minimal but strategically curated demonstrations of autonomous behavior. Key to the framework is a pipeline for collecting high-quality demonstrations of agentic tasks. 

Each demonstration consists of two parts: a query and a trajectory. A query is a natural language request from a user, such as a software development requirement or a scientific research goal.

The trajectory is the series of steps the AI takes to address the query, including its internal reasoning, its calls to external tools like a code interpreter, and the observations it receives from the environment. For example, a query might be "build a simple chat application," and the trajectory would include the agent’s internal reasoning and action plan, the code it writes and executes, and the resulting output or errors.

The trajectory could include multiple iterations of planning, execution, and reflection until it achieves the desired objective.

To build their dataset, the researchers started with 60 queries from real-world scenarios faced by professional developers and researchers. They then expanded this pool by using GPT-5 to synthesize additional queries from GitHub Pull Requests.

They employed a team of four computer science PhD students to vet the quality of these queries and choose 18 examples to create a high-quality set of 78 queries focused on software development and research workflows.

To generate the trajectories, the same PhD students collaborated with a CLI coding agent powered by GPT-5 to complete the 78 tasks.

They followed an iterative process, collecting the entire interaction sequence until each task was successfully completed, capturing the full arc of realistic human-AI collaboration, including back-and-forth communication and iterative refinement. For the more complex queries, the collected trajectories could extend to more than 152,000 tokens.

“This approach guarantees that our models learn not only from successful outcomes but also from the complete problem-solving process, including how to adapt strategies and recover from failures during collaborative execution,” the researchers write.

LIMI in action

To test their framework, the team evaluated models on AgencyBench, a benchmark designed for measuring agentic skills, as well as other established benchmarks for tool use and coding.

They fine-tuned GLM-4.5, a powerful open-source model, using their 78-sample dataset and compared its performance against several frontier models, including the base GLM-4.5, Kimi-K2-Instruct, and DeepSeek-V3.1. The LIMI-trained model achieved an average score of 73.5% on AgencyBench, significantly outperforming all baseline models, the best of which (GLM-4.5) scored 45.1%.

This superiority extended to other benchmarks covering tool use, coding, and scientific computing, where LIMI also outperformed all baselines.

More importantly, the study showed that the model trained on just 78 examples outperformed models trained with 10,000 samples from another dataset, delivering superior performance with 128 times less data. 

“This discovery fundamentally reshapes how we develop autonomous AI systems, suggesting that mastering agency requires understanding its essence, not scaling training data,” the researchers write. “As industries transition from thinking AI to working AI, LIMI provides a paradigm for sustainable cultivation of truly agentic intelligence.”

The researchers have released the code for the data synthesis and training and model weights. For the enterprise, this approach offers a practical path toward developing highly specialized AI agents.

Instead of undertaking massive data collection projects, organizations can leverage their in-house talent and subject matter experts to create small, high-quality datasets for bespoke agentic tasks. This lowers the barrier to entry and enables businesses to build custom AI agents that can provide a competitive edge on the workflows that matter most to them.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

大型语言模型 AI代理 数据质量 自主AI LIMI LLM AI Agents Data Quality Autonomous AI
相关文章