利用AI技术与联邦学习预测蛋白质细胞内定位

Predicting where proteins are located inside a cell is critical in biology and drug discovery. This process is known as subcellular localization. The location of a protein is tightly linked to its function. Knowing whether a protein resides in the nucleus, cytoplasm, or cell membrane can unlock new insights into cellular processes and potential therapeutic targets.

This post explains how researchers can collaboratively train AI models to predict protein properties such as subcellular location—without moving sensitive data across institutions—using NVIDIA FLARE and NVIDIA BioNeMo Framework.

How to fine-tune a model for subcellular localization

A new NVIDIA FLARE tutorial demonstrates how to fine-tune an ESM-2nv model to classify proteins by their subcellular localization. The ESM-2nv model learns from embeddings of protein sequences, leveraging datasets introduced in Light Attention Predicts Protein Location from the Language of Life.

We focus on subcellular localization prediction, formatted as FASTA files following the biotrainer standard that include the sequence, training/validation split, and location class (one of 10, for example: Nucleus, Cell_membrane, and so on).

*Figure 1. Cross-section of an animal cell showing the location of various membrane-bound organelles that are targeted for protein property prediction*

A data sample in this FASTA format looks like this:

&gt;Sequence1 TARGET=Cell_membrane SET=train VALIDATION=False MMKTLSSGNCTLNVPAKNSYRMVVLGASRVGKSSIVSRFLNGRFEDQYTPTIEDFHRKVYNIHGDMYQLDILDTSGNHPFPAMRRLSILTGDVFILVFSLDSRESFDEVKRLQKQILEVKSCLKNKTKEAAELPMVICGNKNDHSELCRQVPAMEAELLVSGDENCAYFEVSAKKNTNVNEMFYVLFSMAKLPHEMSPALHHKISVQYGDAFHPRPFCMRRTKVAGAYGMVSPFARRPSVNSDLKYIKAKVLREGQARERDKCSIQ

Where:

TARGET

SET

VALIDATION

The dataset spans 10 location classes, making it an excellent real-world classification challenge.

How to use federated learning with BioNeMo protein language models

Running this example is refreshingly simple. With BioNeMo Framework v2.5 in Docker, you can spin up a Jupyter Lab environment directly and run the Federated Protein Property Prediction with BioNeMo tutorial notebook in your browser.

On top of the BioNeMo framework, NVIDIA FLARE is used to bring in federated training. Instead of pooling datasets from multiple sites, each participant trains locally and contributes only model updates. With FedAvg, those updates are aggregated centrally to form a shared global model—privacy preserved, collaboration enabled.

Training and visualization

For this demonstration, the team fine-tuned the 650-million-parameter ESM-2nv model, pretrained in BioNeMo. This larger model offers a strong balance between predictive accuracy and computational efficiency, making it well-suited for federated training scenarios.

Key steps in the workflow include:

Data splitting

Federated averaging (FedAvg)

Visualization with TensorBoard

*Figure 3. Federated training (FedAvg) yields higher accuracy at all sites compared to local models, demonstrating the benefit of collaborative learning*

Benefits of using BioNeMo and FLARE for protein prediction

The benefits of using BioNeMo and FLARE extend beyond predicting where proteins localize in a cell. This approach supports the community to build AI for science together. With BioNeMo plus FLARE:

Federated learning strengthens protein property prediction:

Collaboration benefits everyone

BioNeMo Framework accelerates discovery:

Get started with federated protein prediction

Federated protein property prediction with NVIDIA BioNeMo and NVIDIA FLARE is part of a powerful new paradigm. Combining the language of life (protein sequences) with federated AI workflows can accelerate discoveries in drug development, healthcare, and biotech—all while respecting data privacy.

The future of life sciences AI isn’t siloed—it’s collaborative. And with FLARE and BioNeMo, that future is already here. Visit the NVIDIA/NVFlare GitHub repo to get started with Federated Protein Property Prediction with BioNeMo and to see more advanced examples.

How to fine-tune a model for subcellular localization

How to use federated learning with BioNeMo protein language models

Training and visualization

Benefits of using BioNeMo and FLARE for protein prediction

Get started with federated protein prediction

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签