dify blog 09月19日
Dify Agent Strategy新功能详解
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Dify正式推出新的插件类型—Agent Strategy,将固定流程中的部分步骤交由LLM自主决策。Agent Node作为工作流的“决策中心”,负责资源分配和状态管理,而Agent Strategy则是一个可扩展的模板,定义了输入输出格式。目前提供ReAct(“思考-行动-观察”链)和Function Calling两种经典策略,开发者可通过CLI工具快速创建策略插件,自定义配置表单和可视化组件,甚至集成Tree-of-Thoughts等前沿算法。Agent Strategy的核心功能包括:处理用户查询、选择工具、运行工具、处理结果、判断任务完成条件。非技术用户可通过拖拽三步配置(选择策略、链接工具/模型、设置提示模板)完成设置,并通过内置日志机制实时查看执行路径和Token使用情况。

💡 Agent Strategy是Dify推出的新插件类型,将工作流中固定的工具调用和流程步骤交由LLM自主决策,使工作流更具灵活性。

🤖 Agent Node作为工作流的“决策中心”,负责资源分配、状态管理和日志记录,而Agent Strategy则是定义LLM如何处理查询、选择工具、执行工具、处理结果及判断任务完成条件的可扩展模板。

🔧 目前Dify提供ReAct(“思考-行动-观察”链)和Function Calling(精确函数调用)两种经典Agent Strategy,开发者可通过CLI工具创建自定义策略,集成前沿算法如Tree-of-Thoughts,并自定义配置表单和可视化组件。

📊 非技术用户可通过拖拽三步配置(选择策略、链接工具/模型、设置提示模板)完成Agent Strategy设置,并通过内置日志机制实时查看执行路径和Token使用情况,实现透明化推理。

⚙️ Agent Strategy的执行分为初始化、迭代循环和最终响应三个阶段:初始化阶段设置所有参数、工具和上下文;迭代循环阶段准备包含当前上下文的提示,调用LLM并解析其响应,若需调用工具则执行并更新上下文,直至任务完成或达到预设迭代上限;最终阶段返回最终答案或结果。

In traditional automated processes, each tool call is a pre-orchestrated, fixed action. However, when facing complex problems, this rigid structure is like forcing a pianist to mechanically stick to the score. While a workflow is mainly used to constrain how tasks are carried out, the growing reasoning power of LLMs means that parts of the workflow can gradually be entrusted to the LLM. Recently, Dify officially introduced a new plugin type—Agent Strategy—which we’ll explore below.

Core Concept: The Relationship Between Agent Node and Strategy

In a Dify Workflow, the Agent Node takes certain steps out of the fixed flow and tool pattern, handing them over to the LLM for autonomous decisions and judgments. An Agent Strategy is an extensible template that defines the standardized input and output formats. By developing custom interfaces for these strategies, you can implement various solutions such as CoT (Chain-of-Thought), ToT (Tree-of-Thought), GoT (Graph-of-Thought), BoT (Pillars-of-Thought), and even more advanced strategies like semantic kernels.

In Dify, the Agent Node embeds the Agent Strategy and connects with upstream and downstream nodes. Like an LLM node, it tackles a specific task and returns a final response to the next node.

  • Agent Node (execution unit)

The “decision center” of a workflow. It allocates resources, manages states, and logs the entire reasoning process.

  • Agent Strategy (decision logic)

A pluggable reasoning algorithm module that defines how tools are used and how problems are solved.

This decoupled design is like separating a car’s engine from its control system—developers can upgrade the “powertrain” without affecting the overall vehicle architecture. We currently provide two classic Agent Strategies:

  • ReAct: A classic chain of “Think–Act–Observe”

  • Function Calling: Precise function-based calling

You can download both strategies directly from the Marketplace. More importantly, we’ve released an open standard for strategy development. In Dify, any developer can:

  • Quickly create strategy plugins with the CLI tool

  • Customize configuration forms and visualization components

  • Integrate cutting-edge academic algorithms (e.g., Tree-of-Thoughts)

This effectively turns Dify into an “innovation testbed” for AI reasoning strategies, allowing every user to benefit from community-driven advancements.

Feature Overview

Within a Workflow, the Agent Node enables autonomous thinking for multi-step tool reasoning. A minimal Agent Strategy must at least define how to use the LLM API and how to call tools.

For Non-Technical Users

  1. Drag-and-Drop Setup

Simply drag an Agent Node from the tool panel and configure it in three steps:

  • Choose a reasoning strategy

  • Link the tool/model

  • Set the prompt template

  1. Transparent Reasoning

One of Dify’s powerful features is its built-in logging mechanism, which creates a tree-like structure of the agent’s thought process. This structure enables you to:

Visualize the agent’s execution path for debugging complex multi-step reasoning

View in real time:

  • Total time and token usage

  • Each round of reasoning

  • Tool invocation traces

For Developers

Defining an Agent Strategy involves specifying how the language model will:

  1. Handle user queries

  2. Select the right tools

  3. Use the correct parameters to run those tools

  4. Process the results

  5. Decide when the task is complete

We provide a Standardized Development Kit, which includes a strategy configuration component library (e.g., Model Selector/Tool Editor), a structured logging interface, and a sandbox testing environment.

Specifically, a strategy definition covers its identity and metadata, the required parameters (model, tools, query, etc.), parameter types and constraints, and the location of the source code.

An agent executes in three main stages: initialization, iterative looping, and final response. During initialization, the system sets up all necessary parameters, tools, and contexts. Then, in the iterative loop, the system prepares a prompt with the current context and calls the LLM with information about the available tools. It parses the LLM’s response to see if a tool call is needed or if a final answer has already been reached. If a tool call is required, the system executes that tool and updates the context with its output. This loop continues until the task is complete or the preset iteration limit is reached. Finally, in the last phase, the system returns the final answer or result.

For example, a function_calling.yaml file might look like this:

parameters:  - name: model    type: model-selector          scope: tool-call&llm      - name: tools    type: array[tools]        - name: max_iterations    type: number                default: 5extra:  python:    source: function_calling.py

Thanks to this declarative architecture, configuring a strategy feels like filling out a simple form, with support for:

  • Dynamic parameter validation (type/range/dependencies)

  • Automatic multilingual label rendering

For more details, read our documentation.

Future Outlook: Expanding Features

We plan to iterate further and add more developer-friendly components, such as:

  • Knowledge base integration

  • Memory support in Chatflow

  • Error handling and retries

  • Additional official Agent Strategies

Users can freely download different strategies from the community and load them into different nodes to tackle various tasks. To understand the Agent Node’s capabilities, you can start with a simple three-node Chatflow to see how it works and experience something akin to an Agent app. For more complex tasks, consider a routing and handoff approach, treating the Agent Node as an extension of the LLM Node to break down problem-solving into steps.

For instance, the Agent Node can enable OpenAI ChatGPT-4o with Task capabilities (image courtesy of community contributor Pascal).

We look forward to incorporating even more features in Version 1.0.0, and we welcome everyone to contribute additional Agent Strategies!

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Dify Agent Strategy LLM 自动化流程 决策中心 透明化推理 ReAct Function Calling Tree-of-Thoughts
相关文章