Second Brain: Crafted, Curated, Connected, Compounded on 10月02日 21:08
使用YAML动态生成Airflow DAGs
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了如何利用`dag-factory`库,通过YAML配置文件动态生成Apache Airflow DAGs。这种方法允许用户在不熟悉Python或Airflow原生API的情况下,以声明式方式创建DAG,有效避免代码重复,简化了DAG的构建过程。只需一个YAML文件定义DAG结构、任务及其依赖关系,再通过一个简单的Python脚本即可实现DAG的自动生成和部署,大大提高了开发效率。

💡 **简化DAG创建**:`dag-factory`库支持从YAML配置文件动态生成Apache Airflow DAGs,用户无需编写大量Python代码,也无需深入学习Airflow的原生API,即可轻松构建DAG。

✅ **声明式配置**:通过YAML文件,用户可以以声明式的方式定义DAG的结构、任务、执行逻辑以及任务间的依赖关系,如示例中通过`dependencies`字段指定任务的执行顺序。

🚀 **提高开发效率**:这种方法能够有效避免重复编写相似的DAG代码,通过集中管理YAML配置,实现一次编写,多处使用,显著提升了DAG的开发和维护效率。

🔧 **易于集成**:只需在Airflow环境中安装`dag-factory`库,并创建一个Python脚本来加载和解析YAML配置,即可实现DAG的自动生成,方便快捷。

Dynamically generate Apache Airflow DAGs from YAML configuration files. A declarative way of using Airflow.

# Benefits

    Construct DAGs without knowing PythonConstruct DAGs without learning Airflow primitivesAvoid duplicative codedone with YAML

# Quickstart Example

Source

The following example demonstrates how to create a simple DAG using dag-factory. We will be generating a DAG with three tasks, where task_2 and task_3 depend on task_1. These tasks will be leveraging the BashOperator to execute simple bash commands.

    To install dag-factory, run the following pip command in your Apache Airflow® environment:
1
pip install dag-factory
    Create a YAML configuration file called config_file.yml and save it within your dags folder:
 1 2 3 4 5 6 7 8 91011121314151617181920
example_dag1:  default_args:    owner: 'example_owner'    retries: 1    start_date: '2024-01-01'  schedule_interval: '0 3 * * *'  catchup: False  description: 'this is an example dag!'  tasks:    task_1:      operator: airflow.operators.bash_operator.BashOperator      bash_command: 'echo 1'    task_2:      operator: airflow.operators.bash_operator.BashOperator      bash_command: 'echo 2'      dependencies: [task_1]    task_3:      operator: airflow.operators.bash_operator.BashOperator      bash_command: 'echo 3'      dependencies: [task_1]

We are setting the execution order of the tasks by specifying the dependencies key.

    In the same folder, create a python file called generate_dags.py. This file is responsible for generating the DAGs from the configuration file and is a one-time setup. You won’t need to modify this file unless you want to add more configuration files or change the configuration file name.
123456789
from airflow import DAG  ## by default, this is needed for the dagbag to parse this fileimport dagfactoryfrom pathlib import Pathconfig_file = Path.cwd() / "dags/config_file.yml"dag_factory = dagfactory.DagFactory(config_file)dag_factory.clean_dags(globals())dag_factory.generate_dags(globals())

After a few moments, the DAG will be generated and ready to run in Airflow. Unpause the DAG in the Apache Airflow® UI and watch the tasks execute!

Read more on GitHub - astronomer/dag-factory: Dynamically generate Apache Airflow DAGs from YAML configuration files.

# Further Reads


Origin: Anna Geller on LinkedIn
References: Factory Pattern
Created 2024-08-13

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Apache Airflow DAG YAML dag-factory 自动化 配置驱动
相关文章