Second Brain: Crafted, Curated, Connected, Compounded on 10月02日
Ibis:Python数据处理的通用接口
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Ibis是一个Python库,为数据处理提供了轻量级、通用的接口,帮助用户探索和转换任何规模、存储在任何位置的数据。它包含一个DataFrame API,支持超过15个查询引擎,并采用延迟执行机制,将计算推送到后端执行,从而实现本地计算机无法比拟的速度。Ibis借鉴了pandas和dplyr等流行API的设计,提供一致的语法,支持多种后端,并允许在必要时直接使用SQL,减少代码重写。它还提供交互模式,方便快速诊断问题和探索数据。Ibis已升级至DuckDB,显著提升了性能,并能加速原型到生产的流程,提升现有Python或pandas代码的性能。

⭐ **通用数据处理接口**:Ibis提供了一个统一的Python接口,能够连接和操作存储在不同位置(如数据库、文件系统)的各种规模的数据集。它通过支持超过15种查询引擎,让用户无需学习多种查询语言或API,即可在pandas DataFrame、Parquet文件、DuckDB、BigQuery等多种后端之间无缝切换。

🚀 **性能与效率提升**:Ibis的核心优势在于其延迟执行机制,它将代码的执行推迟到后端查询引擎,仅在必要时将结果数据加载到内存。这使得数据分析工作流的运行速度远超本地计算机的限制,尤其是在处理大型数据集时,相比于内存式框架(如pandas)能显著提升性能,并能轻松扩展到分布式系统或云SQL引擎。

💡 **简化开发与维护**:Ibis的API设计借鉴了pandas和dplyr等流行库,易于上手。其一致的语法意味着用户只需学习一次即可在多个后端使用,从而避免了学习不同SQL方言或框架特定代码的麻烦。此外,它还提供了交互模式,方便用户在本地快速诊断问题、探索数据和构建工作流,极大地减少了代码重写和维护成本。

🛠️ **灵活的后端支持与集成**:Ibis支持广泛的查询引擎和DataFrame API,并且这个列表还在不断增长,任何人都可以贡献新的后端。它还提供了一种在Ibis无法直接支持的情况下,能够跳转到原生SQL的能力,增加了其使用的灵活性。最近的升级还特别强调了对DuckDB的优化,使其性能更加出色。

Ibis is a Python library that provides a lightweight, universal interface for data wrangling. It helps Python users explore and transform data of any size, stored anywhere.

Ibis has three primary components:

    A dataframe API for Python. Python users can write Ibis code to manipulate tabular data.Interfaces to 15+ query engines. Wherever data is stored, people can use Ibis as their API of choice to communicate with any of those query engines.Deferred execution. Ibis uses deferred execution, so execution of code is pushed to the query engine. Users can execute at the speed of their backend, not their local computer.

Ibis aims to be a future-proof solution to interacting with data using Python and can accomplish this goal through its main features:

    Familiar API: Ibis’s API design borrows from popular APIs like pandas and dplyr that most users already know and like to use.Consistent syntax: Ibis aims to be a universal Python API for tabular data of any size, big or small.Deferred execution: Ibis pushes code execution to the query engine and only moves required data into memory when necessary. Analytics workflows are faster and more efficientInteractive mode: Ibis provides an interactive mode in which users can quickly diagnose problems, explore data, and mock up workflows and pipelines locally.10+ supported backends: Ibis supports multiple query engines and DataFrame APIs. Use one interface to transform with your data wherever it lives: from DataFrames in pandas to Parquet files through DuckDB to tables in BigQuery.Minimize rewrites: Teams can often keep their Ibis code the same regardless of backend changes, like increasing or decreasing computing power, changing the number or size of their databases, or switching backends entirely.Flexibility when you need it: When Ibis doesn’t support something, it provides a way to jump directly into SQL.

Upgraded to DuckDB
“default DuckDB backend, and DuckDB is much more performant”: Farewell pandas, and thanks for all the fish. – Ibis

    Speed up prototype to production. Scale code written and tested locally to a distributed system or cloud SQL engine with minimal rewrites.Boost performance of existing Python or pandas code. For example a general rule of thumb for pandas is “Have 5 to 10 times as much RAM as the size of your dataset”. When a dataset exceeds this rule using in-memory frameworks like pandas can be slow. Instead, using Ibis will significantly speed up your workflows because of its deferred execution. Ibis also empowers you to switch to a faster database engine, without changing much of your code.Get rid of long, error-prone, f-strings. Ibis provides one syntax for multiple query engines and dataframe APIs that lets you avoid learning new flavors of SQL or other framework-specific code. Learn the syntax once and use that syntax anywhere.

Ibis acts as a universal frontend to the following systems:

The list of supported backends is continuously growing. Anyone can get involved in adding new ones! Learn more about contributing to ibis in our contributing documentation at https://github.com/ibis-project/ibis/blob/master/docs/CONTRIBUTING.md

# Installation

Install Ibis from PyPI with:

1
pip install 'ibis-framework[duckdb]'

We provide a number of tutorial and example notebooks in the ibis-examples. The easiest way to try these out is through the online interactive notebook environment provided here: 

You can also get started analyzing any dataset, anywhere with just a few lines of Ibis code. Here’s an example of how to use Ibis with a SQLite database.

Download the SQLite database from the ibis-tutorial-data GCS (Google Cloud Storage) bucket, then connect to it using ibis.

1
curl -LsS -o geography.db 'https://storage.googleapis.com/ibis-tutorial-data/geography.db'

Connect to the database and show the available tables

 1 2 3 4 5 6 7 8 910
>>> import ibis>>> from ibis import _>>> ibis.options.interactive = True>>> con = ibis.sqlite.connect("geography.db")>>> con.tablesTables------- countries- gdp- independence

Choose the countries table and preview its first few rows

 1 2 3 4 5 6 7 8 910111213
>>> countries = con.tables.countries>>> countries.head()┏━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┓ iso_alpha2  iso_alpha3  iso_numeric  fips    name                  capital           area_km2  population  continent ┡━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━┩ string      string      int32        string  string                string            float64   int32       string    ├────────────┼────────────┼─────────────┼────────┼──────────────────────┼──────────────────┼──────────┼────────────┼───────────┤ AD          AND                  20  AN      Andorra               Andorra la Vella     468.0       84000  EU         AE          ARE                 784  AE      United Arab Emirates  Abu Dhabi          82880.0     4975593  AS         AF          AFG                   4  AF      Afghanistan           Kabul             647500.0    29121286  AS         AG          ATG                  28  AC      Antigua and Barbuda   St. Johns            443.0       86754  NA         AI          AIA                 660  AV      Anguilla              The Valley           102.0       13254  NA        └────────────┴────────────┴─────────────┴────────┴──────────────────────┴──────────────────┴──────────┴────────────┴───────────┘

Show the 5 least populous countries in Asia

 1 2 3 4 5 6 7 8 91011121314151617
>>> (...     countries.filter(_.continent == "AS")...     .select("name", "population")...     .order_by(_.population)...     .limit(5)... )┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓ name                            population ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩ string                          int32      ├────────────────────────────────┼────────────┤ Cocos [Keeling] Islands                628  British Indian Ocean Territory        4000  Brunei                              395027  Maldives                            395650  Macao                               449198 └────────────────────────────────┴────────────┘

Origin: Upcoming Data Engineering Tools : r/dataengineering
References:
Created 2023-09-25

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Ibis Python Data Wrangling Data Engineering SQL pandas DuckDB BigQuery API
相关文章