ToMCLIP：多语言CLIP的拓扑对齐框架

cs.AI updates on arXiv.org 10月14日

ToMCLIP：多语言CLIP的拓扑对齐框架

本文提出ToMCLIP，一种基于拓扑对齐的多语言CLIP框架，通过拓扑保形约束改善跨模态对齐，提高多语言表示的结构一致性和零样本准确性。

arXiv:2510.10889v1 Announce Type: cross Abstract: Contrastive Vision-Language Models (VLMs) have demonstrated strong zero-shot capabilities. However, their cross-modal alignment remains biased toward English due to limited multilingual multimodal data. Recent multilingual extensions have alleviated this gap but enforce instance-level alignment while neglecting the global geometry of the shared embedding space. We address this problem by introducing ToMCLIP (Topological Alignment for Multilingual CLIP), a topology-aware framework aligning embedding spaces with topology-preserving constraints. The proposed method applies persistent homology to define a topological alignment loss and approximates persistence diagram with theoretical error bounds using graph sparsification strategy. This work validates the proposed approach, showing enhanced structural coherence of multilingual representations, higher zero-shot accuracy on the CIFAR-100, and stronger multilingual retrieval performance on the xFlickr&CO. Beyond VLMs, the proposed approach provides a general method for incorporating topological alignment into representation learning.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

多语言CLIP 拓扑对齐跨模态对齐表示学习零样本准确性

相关文章

Researchers at the University College London Unravel the Universal Dynamics of Representation Learning in Deep Neural Networks

高效评估多模态预训练对齐质量，中科大提出模态融合率MIR

Decoding the Hidden Computational Dynamics: A Novel Machine Learning Framework for Understanding Large Language Model Representations

Ola: A State-of-the-Art Omni-Modal Understanding Model with Advanced Progressive Modality Alignment Strategy

首篇多模态 RAG 全栈技术综述出炉~

揭开大模型“伪遗忘”，港理工等团队：结构不变就是没忘

Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge

Thou Shalt Not Prompt: Zero-Shot Human Activity Recognition in Smart Homes via Language Modeling of Sensor Data & Activities

Interview with Aneesh Komanduri: Causality and generative modeling

谢赛宁团队新作打破“多语言诅咒”！MetaCLIP 2支持300多种语言，英语性能反倒提升了