多模态知识融合提升3D手势识别

cs.AI updates on arXiv.org 10月13日

多模态知识融合提升3D手势识别

本文提出一种高效的多模态知识融合方法，用于训练单模态3D卷积神经网络，以实现动态手势识别。通过在各个网络中嵌入多模态知识，提高单模态网络性能，并引入时空语义对齐损失和焦点正则化参数，实现跨模态信息有效整合，实验结果显示该方法在多个数据集上取得了最先进的识别准确率。

arXiv:1812.06145v2 Announce Type: cross Abstract: We present an efficient approach for leveraging the knowledge from multiple modalities in training unimodal 3D convolutional neural networks (3D-CNNs) for the task of dynamic hand gesture recognition. Instead of explicitly combining multimodal information, which is commonplace in many state-of-the-art methods, we propose a different framework in which we embed the knowledge of multiple modalities in individual networks so that each unimodal network can achieve an improved performance. In particular, we dedicate separate networks per available modality and enforce them to collaborate and learn to develop networks with common semantics and better representations. We introduce a "spatiotemporal semantic alignment" loss (SSA) to align the content of the features from different networks. In addition, we regularize this loss with our proposed "focal regularization parameter" to avoid negative knowledge transfer. Experimental results show that our framework improves the test time recognition accuracy of unimodal networks, and provides the state-of-the-art performance on various dynamic hand gesture recognition datasets.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

3D卷积神经网络动态手势识别多模态知识融合

相关文章

PROPHESEE携手Ultraleap与雷鸟创新，开发用于AR眼镜的创新技术

实践探索加速大模型AI应用普及丨第二届人工智能大模型技术高峰论坛预告

全球首款多模态定位感知模组：非普导航推出 xFusion-A1，用于无人机等

CIKM 2024 | 大语言模型推荐中的协同过滤信号和语义信息的深度融合

自动驾驶不怵恶劣天气，西电&上海AI Lab多模态融合检测端到端算法来了 | NeurlPS Oral

研究人员用大模型深度解析人类认知过程与机制，为理解人类语言认知开辟新视角

抖音 VR 直播更名“抖音 XR”：支持横屏 / 竖屏 / 全景观看视频、新增 MR 模式开关

Meta公布黑科技：戴上腕带即可隔空打字，引领神经接口AR革命

东大团队打造超拉伸摩擦电触摸板，实现Transformer算法辅助的手势识别

[制造] 我国首台作业时速公里级水下敷缆机器人完成下水测试