Second Brain: Crafted, Curated, Connected, Compounded on 10月02日 21:13
声明式数据管道:何为声明式编程?
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文深入探讨了声明式数据管道的核心概念,强调其与命令式编程的区别。声明式方法侧重于描述“做什么”而非“如何做”,让每个任务自主决定执行时间和方式,这与函数式数据工程的理念不谋而合。文章还类比了前端开发和DevOps领域的类似转变,如从jQuery到React,以及从脚本到Terraform的演进,指出声明式趋势正席卷现代数据栈,并探讨了在代码中使用声明式编程的优势与可能性,如通过Pulumi或CDK在Python中实现。

✨ 声明式数据管道的核心在于“描述而非指令”:它不指定执行顺序,而是让数据管道中的每个任务或步骤能够自主地确定最佳的执行时间和方式。这种方法专注于定义期望的结果,而不是详细说明实现过程,与函数式数据工程的理念高度契合,后者同样强调不可变性和无副作用的函数。

🔄 声明式编程的演进与广泛应用:文章指出,声明式趋势已渗透到现代数据栈的多个工具中,例如Kestra(YAML数据编排器)、Rill Developer(代码即BI)和dlt(数据集成即代码)。这种趋势与前端工程从jQuery(命令式)到React(声明式)的转变,以及DevOps领域从命令式脚本到Terraform(声明式基础设施即代码)的演进相似,旨在提高稳定性和开发效率。

💻 在代码中实现声明式编程的优势:除了YAML等配置文件,还可以利用通用编程语言(如Python、TypeScript)实现“代码中的声明式编程”。借助Pulumi或CDK等框架,开发者可以使用熟悉的编程语言来声明式地定义数据栈资源、管道和转换。这种方式更具表现力,易于集成逻辑、测试和外部服务,并能利用IDE的强大功能,如代码补全、类型检查和调试。

🎯 关注点的转变:声明式方法还促使对数据处理的关注点发生转变,从关注“表”的输入输出,转向更细粒度的“列”的输入输出,这与软件定义资产(Software-Defined Asset)的概念相关联,进一步推动了数据工程的精细化管理和自动化。

A declarative data pipeline is characterized by not specifying the execution order. Instead, it allows each step or task to autonomously determine the optimal time and method for execution.

The essence of a declarative approach is describing what a program should accomplish rather than dictating the specific control flow. This approach is a hallmark of Functional Data Engineering, a declarative programming paradigm that contrasts with imperative programming paradigms.

Check more on Use declarative pipelining instead of imperative and my recent article on Rill | The Rise of the Declarative Data Stack.

# Functional Programming and Its Relation

Functional programming represents a specific subset of declarative paradigms. Its key is an idempotent approach, in which each function can be restarted without side effects.

As highlighted in Functional Data Engineering:

In the context of Data Orchestration Trends- The Shift From Data Pipelines to Data Products:

The role of abstractions in defining Data Products is pivotal. Utilizing higher-level abstractions leads to more explicit and declarative design.

In The Rise of the Data Engineer, Maxime Beauchemin articulates that a dedicated function is the optimal higher-level abstraction for software definition (automation, testability, well-defined practices, and openness). This approach leverages an inline declaration of upstream dependencies within an open, Pythonic API, underpinning assets with a Python function.

Conversely, an imperative pipeline delineates each procedural step, dictating how to proceed. In stark contrast, a declarative data pipeline refrains from dictating execution order, empowering each component to independently identify its most effective operational parameters.

# Declarative vs. Imperative

See more on Declarative vs Imperative.

# Some Examples

The declarative trends continue with more tools from the modern data stack betting on it. For example, Kestra is a full-fledged YAML Data Orchestrator, Rill Developer is a BI tool as code, dlt is a data integration as code, and many more introduce models; interestingly, many of them use DuckDB under the hood.

# History of Similar Shifts

Frontend engineering has moved from jQuery, an imperative library that makes it easy to manipulate webpages, to React, which allows writing JavaScript that declares the components you want on the screen.

This concept is closely linked with DevOps, where tools like Kubernetes and Descriptive Configs - YAML have revolutionized deployment practices, called Infrastructure as Code. It’s moved from imperative shell scripts that spin servers up and down to frameworks like Terraform, which allow you to simply declare the servers you want to exist.

Rill Developer is doing the same for Business Intelligence and dashboards, where all dashboards are simple YAML files.

A similar transformation is needed in data pipelines to enhance stability, accelerate development cycles, and ensure scalability.

Also, Markdown, HTML, and SQL are declarative, which begs the question: Is that the reason they are so successful?

# Scope of Concerns

From Introduction to the DataForge Declarative Transformation Framework — DataForge

Instead of table in/out they thinkhttps://www.ssp.sh/brain/declarative-20241016192535944.webpn/declarative-20241016192535944.webp">
It goes on with
Pure Fuction (Functional Data Engineering).

Matthew Kosovec also made a YouTube Video

How dies the column in and out compare to Software-Defined Asset?

# Declarative Programming in Code

Is it possible to use a general-purpose programming language instead of YAML or configuration files? This approach is often referred to as “declarative programming in code.” We could use a language like Python, TypeScript, or even a DSL to express the desired state and transformations of your data stack.

For example, in Python, you could define your data stack and infrastructure with a framework like Pulumi or CDK (Cloud Development Kit). These tools allow you to write code that declaratively specifies the resources, pipelines, and transformations while still giving you the flexibility of a full programming language.

Advantages:

    More expressive: Leverage loops, conditionals, and modular code.Easier to integrate with logic, tests, and external services.Use familiar IDE features like linting, type checking, and debugging.

So, while YAML/configs are simpler and often preferred for pure declarative approaches, a programming language can offer more power and flexibility.

Origin: Benthos.

# More Resources

Questions

What is declarative programming, is it the to declarative pipelines? Functional Programming or even Functional Data Engineering?


Origin:
References: Infrastructure as Code
Created 2023-01-25

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

声明式编程 数据管道 函数式数据工程 命令式编程 现代数据栈 Declarative Programming Data Pipelines Functional Data Engineering Imperative Programming Modern Data Stack
相关文章