Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning

cs.AI updates on arXiv.org 08月11日

Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning

本文对比了25个预训练神经网络模型在化学小分子药物设计中的应用，发现大多数模型对基线分子指纹ECFP的改进微乎其微，唯一显著优于其他模型的是CLAMP模型。研究对现有评估方法的严谨性提出质疑，并提出了改进建议。

arXiv:2508.06199v1 Announce Type: cross Abstract: Pretrained neural networks have attracted significant interest in chemistry and small molecule drug design. Embeddings from these models are widely used for molecular property prediction, virtual screening, and small data learning in molecular chemistry. This study presents the most extensive comparison of such models to date, evaluating 25 models across 25 datasets. Under a fair comparison framework, we assess models spanning various modalities, architectures, and pretraining strategies. Using a dedicated hierarchical Bayesian statistical testing model, we arrive at a surprising result: nearly all neural models show negligible or no improvement over the baseline ECFP molecular fingerprint. Only the CLAMP model, which is also based on molecular fingerprints, performs statistically significantly better than the alternatives. These findings raise concerns about the evaluation rigor in existing studies. We discuss potential causes, propose solutions, and offer practical recommendations.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

神经网络化学应用药物设计模型比较 ECFP分子指纹

相关文章

What is a long context window?

Localizing and Editing Knowledge in LLMs with Peter Hase - #679

Are Large Language Models a Path to AGI? with Ben Goertzel - #625

The Benefit of Bottlenecks in Evolving Artificial Intelligence with David Ha - #535

Learning to Ponder: Memory in Deep Neural Networks with Andrea Banino - #528

Deep Learning is Eating 5G. Here’s How, w/ Joseph Soriaga - #525

Vector Quantization for NN Compression with Julieta Martinez - #498

Skip-Convolutions for Efficient Video Processing with Amir Habibian - #496

Natural Graph Networks with Taco Cohen - #440

Neural Ordinary Differential Equations with David Duvenaud - #364