MarkTechPost@AI 10月17日 15:45
C2S-Scale 27B:将单细胞数据转化为“细胞句子”
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

研究人员发布了C2S-Scale 27B,一个基于Gemma-2的270亿参数基础模型,用于单细胞分析。该模型将单细胞RNA测序(scRNA-seq)数据转化为“细胞句子”,即基因符号的有序列表,使语言模型能够直接解析和理解细胞状态。该模型在基准测试中表现优异,并发现了一种实验验证的、依赖于上下文的通路:CK2抑制剂(silmitasertib/CX-4945)与低剂量干扰素联用可增强抗原呈递,有望使“冷”肿瘤对免疫疗法更敏感,体外实验显示抗原呈递效果提升约50%。

🔬 C2S-Scale 27B模型创新性地将高维单细胞RNA测序(scRNA-seq)数据转化为“细胞句子”,即基因符号的有序序列。这种表示方法使得大型语言模型(LLMs)能够直接理解和处理单细胞数据,从而为细胞类型预测、组织分类、聚类说明、扰动预测和生物学问答等任务提供了新的途径,只需将这些任务转化为文本提示和完成的形式。

💡 模型通过一个双上下文虚拟筛选过程,在超过4000种药物中识别出一种能够增强抗原呈递(MHC-I通路)的化合物。关键在于,这种增强作用仅在免疫“冷”肿瘤(低干扰素水平)的患者样本中显现,而在免疫中性细胞系数据中则效果微乎其微。该模型预测CK2抑制剂silmitasertib与低剂量干扰素联用时,能显著上调MHC-I,这一发现已在未参与训练的人类神经内分泌模型中得到实验验证,实现了约50%的抗原呈递提升。

🚀 C2S-Scale 27B模型在训练过程中整合了超过800个公共scRNA-seq数据集,涵盖了超过5700万个人类和小鼠细胞。通过将基因组学数据和生物学文本进行预训练,模型能够在一个统一的多模态语料库中进行学习,从而具备了更强的泛化能力和对生物学过程的深入理解。该模型已在Hugging Face上以开放权重形式发布,包括27B和2B两个Gemma变体,供研究使用。

🔬 实验验证表明,silmitasertib(CK2抑制剂)与低剂量干扰素(如IFN-β和IFN-γ)的组合疗法能够显著提高人类神经内分泌模型的抗原呈递能力,其作用机制是通过降低对干扰素的反应阈值,而非从头开始诱导抗原呈递。流式细胞术分析显示,在联合治疗下,HLA-A、B、C的表达水平得到上调,在不同剂量silmitasertib的条件下均观察到MFI(平均荧光强度)的显著增加,证明了该联合疗法的有效性。

A team of researchers from Google Research, Google DeepMind, and Yale released C2S-Scale 27B, a 27-billion-parameter foundation model for single-cell analysis built on Gemma-2. The model formalizes single-cell RNA-seq (scRNA-seq) profiles as “cell sentences”—ordered lists of gene symbols—so that a language model can natively parse and reason over cellular states. Beyond benchmarking gains, the research team reports an experimentally validated, context-dependent pathway: CK2 inhibition (silmitasertib/CX-4945) combined with low-dose interferon amplifies antigen presentation, a mechanism that could make “cold” tumors more responsive to immunotherapy. The result is ~50% increase in antigen presentation in vitro under the combined condition.

Understanding the model

C2S-Scale converts a high-dimensional expression vector into text by rank-ordering genes and emitting the top-K symbols as a gene-name sequence. This representation aligns single-cell data with standard LLM toolchains and allows tasks such as cell-type prediction, tissue classification, cluster captioning, perturbation prediction, and biological QA to be phrased as text prompts and completions.

https://github.com/vandijklab/cell2sentence

Training data, stack, and release

C2S-Scale-Gemma-2-27B is built on Gemma-2 27B (decoder-only Transformer), trained on Google TPU v5, and released under CC-BY-4.0. The training corpus aggregates >800 public scRNA-seq datasets spanning >57M cells (human and mouse) with associated metadata and textual context; pretraining unifies transcriptomic tokens and biological text into a single multimodal corpus.

The key result: an interferon-conditional amplifier

The research team constructed a dual-context virtual screen over >4,000 drugs to find compounds that boost antigen presentation (MHC-I program) only in immune-context-positive settings—i.e., primary patient samples with low interferon tone—while having negligible effect in immune-context-neutral cell-line data. The model predicted a striking context split for silmitasertib (CK2 inhibitor): strong MHC-I upregulation with low-dose interferon, little to none without interferon. The research team reports in-lab validation in human neuroendocrine models unseen in training, with the combination (silmitasertib + low-dose interferon) producing a marked, synergistic increase in antigen presentation (≈50% in their assays).

The amplifier lowers the response threshold to interferon rather than initiating antigen presentation de novo; flow-cytometry readouts show HLA-A,B,C upregulation only under combined treatment (including IFN-β and IFN-γ), across two neuroendocrine models, with representative MFI gains (e.g., 13.6% @10 nM and 34.9% @1000 nM silmitasertib in one model).

Key Takeaways

Editorial Comments

C2S-Scale 27B is a technically credible step for LLMs in biology: translating scRNA-seq into “cell sentences” lets a Gemma-2 model run programmatic queries over cell states and perturbations, and in practice it surfaced an interferon-conditional amplifier—silmitasertib (CK2 inhibition)—that increases MHC-I antigen presentation only with low-dose IFN, a mechanism the team then validated in vitro. The value here isn’t headline rhetoric but the workflow: text-native screening across >4k compounds under dual immune contexts to propose a context-dependent pathway that may convert immune-“cold” tumors toward visibility. That said, all evidence is preclinical and bench-scale; the right read is “hypothesis-generating AI” with open weights enabling replication and stress-testing, not a clinical claim.


Check out the Technical Paper, Model on HF, GitHub Page and Technical details . Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Google AI Releases C2S-Scale 27B Model that Translate Complex Single-Cell Gene Expression Data into ‘cell sentences’ that LLMs can Understand appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

C2S-Scale 27B 单细胞分析 Gemma-2 语言模型 scRNA-seq 抗原呈递 免疫疗法 AI in Biology LLM
相关文章