cs.AI updates on arXiv.org 10月06日 12:28
设计选择对视觉生成模型组合泛化影响研究
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文通过系统研究设计选择对图像和视频生成中组合泛化的影响,发现训练目标是否作用于离散或连续分布以及条件提供的信息程度是关键因素。实验表明,通过放松MaskGIT离散损失,结合辅助的连续JEPA目标,可提升组合性能。

arXiv:2510.03075v1 Announce Type: cross Abstract: Compositional generalization, the ability to generate novel combinations of known concepts, is a key ingredient for visual generative models. Yet, not all mechanisms that enable or inhibit it are fully understood. In this work, we conduct a systematic study of how various design choices influence compositional generalization in image and video generation in a positive or negative way. Through controlled experiments, we identify two key factors: (i) whether the training objective operates on a discrete or continuous distribution, and (ii) to what extent conditioning provides information about the constituent concepts during training. Building on these insights, we show that relaxing the MaskGIT discrete loss with an auxiliary continuous JEPA-based objective can improve compositional performance in discrete models like MaskGIT.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

视觉生成模型 组合泛化 设计选择 MaskGIT JEPA
相关文章