cs.AI updates on arXiv.org 10月08日
molFTP:高效、防泄露的分子片段频率向量化方法
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出molFTP,一种高效、防泄露的分子片段频率向量化方法。通过实现dummy-masking程序和key-LOO技术,有效防止特征泄露,实现近似完全数据训练,同时保持模型性能的 unbiased 交叉验证估计。

arXiv:2510.06029v1 Announce Type: cross Abstract: We introduce molFTP (molecular fragment-target prevalence), a compact representation that delivers strong predictive performance. To prevent feature leakage across cross-validation folds, we implement a dummy-masking procedure that removes information about fragments present in the held-out molecules. We further show that key leave-one-out (key-loo) closely approximates true molecule-level leave-one-out (LOO), with deviation below 8% on our datasets. This enables near full data training while preserving unbiased cross-validation estimates of model performance. Overall, molFTP provides a fast, leakage-resistant fragment-target prevalence vectorization with practical safeguards (dummy masking or key-LOO) that approximate LOO at a fraction of its cost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

molFTP 分子片段频率向量化 特征泄露 交叉验证
相关文章