Smart Routing for Multimodal Video Retrieval: When to Search What

cs.AI updates on arXiv.org 07月21日

Smart Routing for Multimodal Video Retrieval: When to Search What

ModaRoute通过动态选择最优模态，实现多模态视频检索，利用GPT-4.1路由查询，降低计算开销，提高检索效果。

arXiv:2507.13374v1 Announce Type: cross Abstract: We introduce ModaRoute, an LLM-based intelligent routing system that dynamically selects optimal modalities for multimodal video retrieval. While dense text captions can achieve 75.9% Recall@5, they require expensive offline processing and miss critical visual information present in 34% of clips with scene text not captured by ASR. By analyzing query intent and predicting information needs, ModaRoute reduces computational overhead by 41% while achieving 60.9% Recall@5. Our approach uses GPT-4.1 to route queries across ASR (speech), OCR (text), and visual indices, averaging 1.78 modalities per query versus exhaustive 3.0 modality search. Evaluation on 1.8M video clips demonstrates that intelligent routing provides a practical solution for scaling multimodal retrieval systems, reducing infrastructure costs while maintaining competitive effectiveness for real-world deployment.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

ModaRoute 多模态视频检索智能路由 GPT-4.1

相关文章

有迹象显示 OpenAI 准备推出 GPT-4.1

GPT-4.1即将登场但今天最让OpenAI CEO兴奋的新功能不是它

ChatGPT解锁完整记忆，奥特曼彻夜难眠，所有聊天记录黑历史AI全知道

ChatGPT解锁完整记忆，奥特曼彻夜难眠！所有聊天记录黑历史AI全知道

OpenAI对标DeepSeek的开源模型，本周就要来了？

OpenAI’s new GPT-4.1 AI models focus on coding

OpenAI 发布新版 GPT-4.1 AI 模型专注于编程

Announcing the GPT-4.1 model series for Azure AI Foundry developers

OpenAI's GPT-4.1 and separating the API from ChatGPT

AI 编程新王者：OpenAI GPT-4.1 系列登场，上下文百万 tokens、代码生成速度飙升 40%