MarkTechPost@AI 10月21日 15:22
DeepSomatic:AI助力癌症基因变异检测
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

谷歌研究院与加州大学圣克鲁兹分校的研究人员发布了DeepSomatic,一款能够识别癌症细胞基因变异的人工智能模型。该模型在与儿童纪念医院的合作中,成功发现了其他工具未能识别出的10个儿童白血病细胞变异。DeepSomatic专为癌症基因组设计,可处理Illumina短读长、PacBio HiFi长读长以及Oxford Nanopore长读长数据。它扩展了DeepVariant的功能,能够检测全基因组和全外显子组数据中的单核苷酸变异及小型插入和缺失,并支持肿瘤-正常配对和仅肿瘤两种工作流程,包括FFPE样本模型。

💡 **DeepSomatic的创新之处**:该AI模型能够识别癌症细胞中的基因变异,并能同时处理Illumina短读长、PacBio HiFi长读长及Oxford Nanopore长读长等多种测序技术的数据。它在儿童白血病研究中,成功发现了其他工具遗漏的变异,展现了其在癌症基因组学研究中的潜力。

⚙️ **技术原理与优势**:DeepSomatic将比对后的测序读段转化为类似图像的张量(tensors),其中编码了读段堆积信息、碱基质量和比对上下文。通过卷积神经网络进行分类,能够区分体细胞变异和非体细胞变异。这种设计使其能够跨平台应用,并能有效处理复杂的样本,如胶质母细胞瘤和儿童白血病。

📊 **训练与测试的基准**:研究团队采用了名为CASTLE(Cancer Standards Long read Evaluation)的数据集进行模型训练和评估。该数据集包含6对匹配的肿瘤和正常细胞系,并使用了三种不同的测序技术进行全基因组测序。研究团队同时发布了基准测试集和访问权限,以填补多技术体细胞变异检测的训练和测试资源空白。

📈 **卓越的性能表现**:DeepSomatic在单核苷酸变异(SNVs)和插入/缺失(indels)的检测上,相比现有常用方法均有显著提升。特别是在Indels检测方面,在Illumina数据上达到约90%的F1分数,在PacBio数据上超过80%,显著优于其他基线方法。模型在参考系和附加样本中识别了329,011个体细胞变异。

🌍 **泛化能力与实际应用**:DeepSomatic已被证明能有效泛化到训练集之外的真实癌症样本,例如在胶质母细胞瘤样本中恢复了已知的驱动基因。在仅肿瘤模式下,即使没有匹配的正常样本,也能在儿童白血病样本中识别已知变异并发现新的变异,表明该模型在不同疾病背景和缺乏正常对照的情况下仍能可靠工作。

A team of researchers from Google Research and UC Santa Cruz released DeepSomatic, an AI model that identifies cancer cell genetic variants. In research with Children’s Mercy, it found 10 variants in pediatric leukemia cells missed by other tools. DeepSomatic has a somatic small variant caller for cancer genomes that works across Illumina short reads, PacBio HiFi long reads, and Oxford Nanopore long reads. The method extends DeepVariant, detects single nucleotide variants and small insertions and deletions in whole genome and whole exome data, and supports tumor normal and tumor only workflows, including FFPE models.

https://research.google/blog/using-ai-to-identify-genetic-variants-in-tumors-with-deepsomatic/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct

How It Works?

DeepSomatic converts aligned reads into image like tensors that encode pileups, base qualities, and alignment context. A convolutional neural network classifies candidate sites as somatic or not and the pipeline emits VCF or gVCF. This design is platform agnostic because the tensor summarizes local haplotype and error patterns across technologies. Google researchers describe the approach and its focus on distinguishing inherited and acquired variants including difficult samples such as glioblastoma and pediatric leukemia.

Datasets and Benchmarking

Training and evaluation use CASTLE, Cancer Standards Long read Evaluation. CASTLE contains 6 matched tumor and normal cell line pairs that were whole genome sequenced on Illumina, PacBio HiFi, and Oxford Nanopore. The research team releases benchmark sets and accessions for reuse. This fills a gap in multi technology somatic training and testing resources.

https://research.google/blog/using-ai-to-identify-genetic-variants-in-tumors-with-deepsomatic/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct

Reported Results

The research team report consistent gains over widely used methods in both single nucleotide variants and indels. On Illumina indels, the next best method is about 80 percent F1, DeepSomatic is about 90 percent. On PacBio indels, the next best method is under 50 percent, DeepSomatic is above 80 percent. Baselines include SomaticSniper, MuTect2, and Strelka2 for short reads and ClairS for long reads. The study reports 329,011 somatic variants across the reference lines and an additional preserved sample. Google research team reports that DeepSomatic outperforms current methods with particular strength on indels.

https://research.google/blog/using-ai-to-identify-genetic-variants-in-tumors-with-deepsomatic/?utm_source=twitter&utm_medium=social&utm_campaign=social_post&utm_content=gr-acct

Generalization to Real Samples

The research team evaluates transfer to cancers beyond the training set. A glioblastoma sample shows recovery of known drivers. Pediatric leukemia samples test the tumor only mode where a clean normal is not available. The tool recovers known calls and reports additional variants in that cohort. These studies indicate the representation and training scheme generalize to new disease contexts and to settings without matched normals.

Key Takeaways

Editorial Comments

DeepSomatic is a pragmatic step for somatic variant calling across sequencing platforms, the model keeps DeepVariant’s image tensor representation and a convolutional neural network, so the same architecture scales from Illumina to PacBio HiFi to Oxford Nanopore with consistent preprocessing and outputs. The CASTLE dataset is the right move, it supplies matched tumor and normal cell lines across 3 technologies, which strengthens training and benchmarking and aids reproducibility. Reported results emphasize indel accuracy, about 90% F1 on Illumina and more than 80% on PacBio against lower baselines, which addresses a long running weakness in indel detection. The pipeline supports WGS and WES, tumor normal and tumor only, and FFPE, which matches real laboratory constraints.


Check out the Technical PaperTechnical details, Dataset and GitHub Repo. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post Google AI Research Releases DeepSomatic: A New AI Model that Identifies Cancer Cell Genetic Variants appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

DeepSomatic AI 癌症基因变异 基因组学 DeepVariant 测序技术 医学研究 Cancer Genetic Variants Genomics Sequencing Technologies Medical Research
相关文章