搭建公共数据站点

OpenMined Blog 09月12日

搭建公共数据站点

本文介绍了如何创建和部署自己的公共数据站点，使数据所有者能够安全地共享数据，同时保持对数据的完全控制和隐私。文章详细说明了使用SyftBox和Flower联邦学习框架的步骤，包括部署数据站点、管理数据和任务，以及如何与其他数据科学家进行安全协作。通过本文，读者可以了解如何在联邦学习网络中作为数据提供者参与。

🔹 文章首先介绍了数据所有者的目标，即在保持数据隐私和控制权的同时，使数据可用于安全计算。这需要部署一个持久的、网络可访问的数据站点，供其他数据科学家提交任务。

🔸 部署公共数据站点的第一步是设置SyftBox客户端，将LOCAL_TEST标志设置为False。这将指导SyftBox在网络上部署一个持久的、网络可访问的数据站点，并使用公开的模拟数据以便于发现和实验。

🔷 数据所有者需要使用他们在第一部分第三步中创建的官方注册身份（DO_EMAIL）来运行数据站点。这是他们在网络上的唯一、可验证地址。

🔲 数据所有者还需要运行远程数据科学（rds）服务器，以便能够接收来自数据科学家的任务。这可以通过在终端中运行uv run syft-rds server来实现。

🔶 管理数据和任务的过程与数据科学家的工作流程非常相似，无论数据站点是本地还是网络上的，都需要执行相同的步骤。这包括创建数据集、监控传入的任务以及审查和执行批准的任务。

The Data Owner Role: Setting Up a Public Datasite

In part 1 and part 2, we simulated a Data Owner’s environment and then connected to a pre-existing remote one. In this final part of the series, we will learn how to create and deploy our own public datasite. This is the crucial step for any organization wanting to participate as a data provider in a larger federated learning network.

As a Data Owner, your goal is to make your data available for secure computation while maintaining full control and privacy. This means deploying a persistent, network-accessible datasite that other data scientists can submit jobs to.

If you are already a Federated Learning practitioner, consider our Federated Learning Co-Design Program. You will get direct support from the OpenMined team to build production ready federated learning solutions.

Learn more and apply now →

Step 1: Deploying a Public Datasite

To deploy a public datasite instead of a local simulation, we turn the LOCAL_TEST flag False. When you have your SyftBox client running, it will instruct SyftBox to deploy a persistent, network-accessible datasite on the network with public mock data for discoverability and experimentation.

For the DO_EMAIL, you must use the official, registered identity you created with SyftBox in Part 1, Step 3. This is your unique, verifiable address on the network.

# In your Data Owner setup notebook (e.g., do1.ipynb)import syft_rds as syfrom syft_core import ClientDO_EMAIL = Client.load().emailprint("DO email: ", DO_EMAIL)do_client = sy.init_session(host=DO_EMAIL)

Once this command completes, your datasite is live and discoverable on the SyftBox network by other authenticated users.

The data owners will also have to run a remote data science (rds) server to be able to receive jobs from data scientists. So next, let’s run it in a terminal:

uv run syft-rds server

Step 2: Managing Data and Jobs

Just like the Data Scientist’s workflow, the Data Owner’s day-to-day management tasks remain exactly the same whether the datasite is local or on the network.

You would follow the identical steps from Part 1 to:

Create your dataset

path

mock_path

Monitor for incoming jobs

do_client.jobs.get_all()

Review and execute

job.show_user_code()

do_client.run_private(job)

The syft_flwr framework is designed to provide a consistent and simple interface, abstracting away the underlying complexities of networking and deployment. This allows you, the Data Owner, to focus on what matters: data governance and secure collaboration.

Series Conclusion: Your Journey in Federated Learning

Congratulations! Over this three-part series, you have navigated a complete, end-to-end federated learning workflow. Let’s recap the journey:

Part 1

Part 2

SyftBox

Part 3

You have successfully trained a machine learning model that learned from multiple, distributed private datasets without anyone ever having to share their raw data. This is the core promise of federated learning in action, made practical and accessible by leveraging OpenMined’s Syftbox and the Flower federated learning framework.

Start Building for Production?

We invite data scientists, researchers, and engineers working on production federated learning use cases to check out and apply to our Federated Learning Co-Design Program (No commitments). You will get direct support from the OpenMined team to build production-ready federated learning solutions.

Apply to the Co-Design Program Now

Have questions or want to contribute?

Your journey doesn’t have to end here. The best way to learn is by doing and engaging with the community. If you have questions, run into issues, or want to share your experience, the OpenMined community is the place to go.

Slack Community

#community-federated-learning

The post Federated Learning in Practice: Training a Diabetes Prediction Model Across Distributed Datasites – Part 3 appeared first on OpenMined.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑