Databricks 09月29日 08:39
KPMG利用Delta Sharing提升外部审计数据分析效率
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

KPMG近期在与一家英国大型能源供应商的合作中,成功运用Databricks的Delta Sharing技术,克服了外部审计中数据分析的性能瓶颈,显著提升了效率和审计质量。面对海量的交易数据和紧迫的审计时限,传统的数据访问和分析方式面临数据量过大、数据传输延迟、查询性能下降以及资源受限等挑战。Delta Sharing作为一种开放的数据共享协议,实现了KPMG与被审计方之间安全高效的跨平台数据交换,无需数据复制。相比于传统的PostgreSQL方案,Delta Sharing在处理大规模数据集、降低成本、提供灵活性方面展现出明显优势。通过此次实践,KPMG不仅大幅缩短了查询时间,还提高了数据分析结果的准确性,并节约了存储和计算成本,为未来的审计工作提供了更高效、安全、可控的解决方案。

🔍 **数据分析挑战与Delta Sharing的解决方案**: 外部审计面临着分析海量交易数据的严峻挑战,传统方式在处理PB级别数据时遇到瓶颈。KPMG与英国某能源供应商合作,利用Databricks的Delta Sharing技术,实现了安全、高效、无需复制的数据共享,有效解决了数据量过大、传输延迟和查询性能下降等问题,为大规模数据分析提供了可行的技术路径。

🚀 **Delta Sharing的优势与实施**: Delta Sharing作为一种开放协议,相比于PostgreSQL等传统方案,具有处理海量数据、降低成本(减少数据复制)、提高灵活性(支持多种工具访问)的显著优势。KPMG通过JSON格式提供所需表视图,被审计方利用Lakeflow Jobs和Delta Sharing直接在KPMG的Databricks环境提供数据,并集成Unity Catalog实现统一权限管理和数据治理,确保了跨云安全数据交换和审计过程的合规性。

📊 **显著的业务影响与未来展望**: 实施Delta Sharing后,KPMG的查询速度提升超过80%(例如,从14.5小时缩短至2.5小时),数据分析结果准确性提升15个百分点,并大幅降低了存储和计算成本。该技术还加快了数据准备速度,简化了团队成员的入职流程。KPMG计划将此成功经验扩展到SAP等其他数据源,进一步提升审计效率和质量。

💡 **技术考量与未来发展**: 作为早期采用者,KPMG也指出了Delta Sharing尚待完善的功能(如共享物化视图)和Lakeflow Jobs在确认上游作业完成机制上的不足。但随着GA版本的发布和与SAP的战略合作,KPMG期待Delta Sharing在未来能带来更强大的功能支持,助力其构建更 streamlined 的审计流程,尤其是在处理和分析SAP等复杂企业数据方面。

🤝 **安全、高效、可信的审计新模式**: Delta Sharing使KPMG能够跨云平台安全地访问最新、单一来源的数据,无需延迟或手动数据迁移。这不仅加快了审计速度,也为被审计客户提供了更可靠的结果,同时在每一步都实现了对数据访问的严格控制。该技术正在重塑审计数据分析的格局,为实现更高效、准确的审计流程奠定基础。

Seamless and secure access to data has become one of the biggest challenges facing organizations. Nowhere is this more evident than in technology-led external audits, where analyzing 100% of transactional data is fast becoming the gold standard. These audits involve reviewing tens of billions of lines of financial and operational billing data.

To deliver meaningful insights at scale, analysis must not only be robust but also efficient — balancing cost, time, and quality to achieve the best outcomes in tight timeframes.

Recently in collaboration with a major UK energy supplier, KPMG leveraged Delta Sharing in Databricks to overcome performance bottlenecks, improve efficiency, and enhance audit quality. This blog discusses our experience, the key benefits, and the measurable impact on our audit process from using Delta Sharing.

The Business Challenge

To meet public financial reporting deadlines, we needed to access and analyze tens of billions of lines of the audited entity's billing data within a short audit window.

Historically, we relied on the audited entity's analytics environment hosted in AWS PostgreSQL. As data volumes grew, the setup showed its limits:

    Data Volume: Our approach required looking beyond the audit period to analyze historical data that was essential for the routine. As this dataset has significantly grown year on year, it eventually exceeded AWS PostgreSQL limits. This forced us to split the data across two separate databases, introducing additional operational overhead and cost.Data Transfer: Moving and copying data from a production environment to a ‘ring-fenced’ analytics PostgreSQL database caused a delayed start and a lack of freshness and agility.Query Performance Degradation: While PostgreSQL does support parallelism, it does not leverage multiple CPU cores when executing a single query, leading to suboptimal performance.Resourcing: Because access to the entity’s analytics environment was limited to their assets, we faced challenges in making the best use of our people and quickly onboarding new team members.

Given these constraints, we needed a scalable, high-performance solution that would allow efficient access to and processing of data without compromising security or governance, enabling reduced ‘machine time’ for quicker outcomes.

Why Delta Sharing?

Delta Sharing, an open data-sharing protocol, provided the ideal solution by enabling secure and efficient cross-platform data exchange between KPMG and the audited entity without duplication.

Compared to extending PostgreSQL, Databricks offered several distinct advantages:

    Handles Large Datasets: Delta Sharing is designed to handle petabyte-scale data, eliminating PostgreSQL's performance limitations.Lower costs: Delta Sharing lowered storage and compute costs by reducing the need for large-scale data replication and transfers.Flexibility: Shared data could be accessed in Databricks using all of PySpark, SQL, and BI tools like Power BI, facilitating seamless integration into our audit deliverables.Delta Tables: We could “time travel” to past states of data. This was valuable for checking historical points that were previously lost in the client’s data model.

Implementation Approach

We introduced Delta Sharing in a way that did not disrupt ongoing audit work:

    Data Sharing: We gave the entity a list (in JSON format) of the tables and views we needed. They used Lakeflow Jobs and Delta Sharing to make these available to us directly in our Databricks environment. The audited entity provided access by sharing a key, granting us permission to secure these pre-agreed datasets with minimal effort between AWS and Azure. Delta Sharing handled this cross-cloud exchange securely, without copying or moving the data between platforms.Integration with Unity Catalog: Unity Catalog gave us a single place to manage permissions, apply governance policies, and maintain full visibility of who accessed what data.Scheduled Data Refreshes: During key audit cycles, data was refreshed to align with financial reporting timelines.Performance Optimization: Once inside Databricks, we reworked queries from PostgreSQL to Spark SQL and PySpark. With Delta Sharing providing governed, ready-to-use data, we focused on optimizing performance rather than managing data movement.
Figure 1: KPMG Implementation Approach

Measurable Impact

We used Delta Sharing to access and analyze billions of meter readings across millions of their customer accounts., We observed significant improvements across multiple KPIs:

    Faster queries: Delta Sharing allowed us to use more computing power for big data tasks. Some of our most complex queries finished over 80% faster—for example, going from 14.5 hours to 2.5 hours—compared to our old PostgreSQL process.Improved Audit Quality: By spending less time waiting for machines, we had more time to focus on exceptions, unusual patterns and complex edge cases. This improved our data analytics results by 15 percentage points in some instances and reduced the burden of any residual sampling.Cost Savings: By using Delta Sharing, we avoided making extra copies of the data. This meant we only stored and processed what was needed, which brought down both storage and compute costs.Quicker access: Since the data was provisioned through Delta Sharing, there was less time wasted waiting for it to be ready, allowing us to start work sooner.Easier Team Onboarding: Seamless on-boarding new team members and broader mix of coding skills - SQL and PySpark.
Using Delta Sharing has made a noticeable difference to our audit process. We can securely access data across cloud platforms-without delays or manual data movement-so our teams always work from the latest, single source of truth. This cross-cloud capability means faster audits, more reliable results for the audited clients we work with, and tight control over data access at every step. — Anna Barrell, Audit partner, KPMG UK

Technical Considerations

A couple of technical considerations of working with Databricks that should be considered:

• Delta Sharing: As early adopters, some features weren’t yet available (for example, sharing materialized views) though we’re excited that these are now refined with the GA release and we’ll be enhancing our delta sharing solutions with this functionality.

• Lakeflow Jobs: Currently, there is no mechanism to confirm whether an upstream job for a Delta Shared table has been completed. One script was executed before completion and led to an incomplete output, though this was quickly identified through our completeness and accuracy procedures.

Looking to the Future

Delta Sharing has proven to be a game-changer for audit data analytics, enabling efficient, scalable, and secure collaboration. Our successful implementation with the energy supplier demonstrates the value of Delta Sharing for clients with diverse data sources across cloud and platform.

We recognize that many organizations store a significant portion of their financial data in SAP. This presents an additional opportunity to apply the same principles of efficiency and quality at an even greater scale.

Through Databricks’ strategic partnership with SAP, announced in February of this year, we can now access SAP data via Delta Sharing. This joint solution, which has become one of SAP's fastest-selling products in a decade, allows us to tap into this data while preserving its context and syntax. By doing so, we can ensure the data remains fully governed under Unity Catalog and its total cost of ownership is optimized. As the entities we audit progress on their transformation journey, we at KPMG are looking to build on this traction, anticipating the additional benefits it will bring to a streamlined audit process.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Delta Sharing Databricks KPMG 外部审计 数据分析 性能优化 成本节约 云数据共享 SAP AI External Audit Data Analysis Performance Optimization Cost Savings Cloud Data Sharing
相关文章