dify blog 09月19日
Dify 0.6.11版本更新:新增网页数据源与协作功能
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Dify 0.6.11版本新增了与Firecrawl合作引入的网页数据源,支持将网站转化为适合LLM使用的结构化数据,方便构建RAG应用。新增'同步网站'选项,用户可通过Firecrawl API获取免费额度,并设置爬取深度和范围。同时优化了数据嵌入流程,支持将爬取数据预处理后存入向量数据库。此外,版本提升了工作流协作功能,可在编排页添加注释,共享DSL文件时保留备注,并新增了模型提供商和向量数据库支持,改进了依赖管理和用户角色权限。

🔗 新增Firecrawl网页数据源:通过爬取网站转化为结构化Markdown或数据,支持构建RAG应用,提供云端版本和开源版本两种选择。

📊 数据同步与设置:新增'同步网站'选项位于知识库面板,用户可获取500页免费额度,设置爬取深度(0-1级)和子页限制,并支持Exclude/Include路径过滤。

📈 数据嵌入与存储:Firecrawl并行爬取效率高,完成后的数据可直接在Dify选择并预处理,最终嵌入向量数据库形成新知识库。

🤝 协作功能提升:工作流编排页支持添加注释,共享DSL文件时保留备注,便于团队沟通;新增'Editor'用户角色可管理工作空间应用。

🚀 技术升级:新增3个模型提供商,更新4个模型(含Jina-CLIP-v1),集成TiDB/Chroma/腾讯云向量数据库;依赖管理从pip切换至poetry,提升下载速度和冲突解决。

Hey, it’s Leilei from Dify! I’ve got some exciting news to share with you in our latest version 0.6.11. Our team collaborated with Firecrawl to add a new web data source to our knowledge base, and it worked really well. I’m excited to fill you in on the details and some other cool integrations we completed.

Bridge Between Web Data and Your RAG App

Firecrawl can turn websites into LLM-ready data by crawling and converting any website into clean markdown or structured data. This data is perfectly suitable for use when building your RAG application on Dify.

Getting Started

The new 'Sync from website' option is conveniently located on the Knowledge dashboard. To get started, simply configure your settings by obtaining an API key from Firecrawl. You'll receive 500 free credits (pages), which is suitable for anyone interested in exploring their cloud service.

On the other hand, Firecrawl also offers an open-source software (OSS) edition, empowering you to set up your own server and perform unlimited crawling and scraping. The OSS version can work efficiently with Dify as well. In this introduction, however, I will focus on the cloud edition.

Easy Setup

Firecrawl crawls all accessible subpages, even without a sitemap, and returns a clean, structured markdown. In Dify, we provide some options for you to easily set according to your specific needs.

The 'Crawl sub-pages' option allows you to:

  1. Set the limit for the total number of sub-pages

  2. Set the max depth relative to the entered URL (Depth 0 scrapes only the entered URL, while depth 1 scrapes the URL and its immediate sub-pages)

These two options can cover most of your basic requirements. Additionally, we provide Exclude and Include path options to further refine your web crawling needs.

Data Embedding with Dify

Firecrawl efficiently crawls web pages in parallel, delivering results quickly. Once the crawling process is complete, you can easily select the desired pages of web data directly on Dify. The selected web data is then ready for further text preprocessing and cleaning steps. After these steps are completed, the data will be embedded and stored in Dify's vector DB as a new knowledge base.

Ready for RAG Application

Now, you can create a RAG app that uses web data as contextual knowledge on Dify! These latest data can be leveraged even more effectively in business scenarios, such as monitoring market trends, staying updated with news, and tracking competitor information.

Boosting Workflow Collaboration

In addition to integrating web data sources for Dify Knowledge, Dify v0.6.11 has also introduced improvements to facilitate collaboration when building your workflow.

You can now add notes at any point on the Workflow orchestration page, making it easier to share ideas and work together with your team. When you share the Workflow as a DSL file, these notes will be preserved, allowing you to effectively communicate your creative ideas with your team and the community.

Added Benefits

Besides those two major updates I've already shared above, here are some additional highlights you might be interested in:

  • Added three new model providers and updated models for four, including the Jina-CLIP-v1 embedding model.

  • Integrated new vector databases from TiDB, Chroma, and Tencent cloud into your RAG engine.

  • Dependency Management: Transitioning from pip to poetry for better package management, offering faster parallel dependency downloads and superior conflict resolution.

  • New ‘Editor’ User Role in Dify Workspace: Editors can now add and edit apps within the workspace.

Join the community

We'd love to hear your thoughts on these updates! Feel free to share your perspective with us by mentioning @dify_ai or @DifyJapan on Twitter. We're always eager to learn from our users and improve our product. Our Discord channel is also open for you to engage with the community, share your ideas, and get the latest news.

For a comprehensive list of changes, please refer to the release log on GitHub.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Dify Firecrawl 网页数据源 RAG应用 协作功能 模型提供商 向量数据库
相关文章