视频理解_Fishai

热点

"视频理解" 相关文章

北大字节开源首个时空推理视频模型！思考过程全透明，性能超越GPT-4o

量子位 2025-11-05T09:58:45.000000Z

北大字节开源首个时空推理视频模型，思考过程全透明，性能超越GPT-4o

36kr-科技 2025-11-05T09:13:38.000000Z

Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?

machinelearning apple 2025-10-27T17:00:29.000000Z

StreamingTOM: Streaming Token Compression for Efficient Video Understanding

cs.AI updates on arXiv.org 2025-10-22T04:21:28.000000Z

告别「偏科」，UniVid实现视频理解与生成一体化

机器之心 2025-10-21T05:09:39.000000Z

MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues

cs.AI updates on arXiv.org 2025-10-21T04:28:51.000000Z

Xiaoice: Training-Free Video Understanding via Self-Supervised Spatio-Temporal Clustering of Semantic Features

cs.AI updates on arXiv.org 2025-10-21T04:27:10.000000Z

苹果将在 ICCV 2025 展示多项前沿视觉研究成果

oschina.net 2025-10-14T07:01:27.000000Z

苹果将在 ICCV 2025 展示多项前沿视觉研究成果

oschina.net 2025-10-14T07:01:27.000000Z

video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory

cs.AI updates on arXiv.org 2025-10-14T04:19:50.000000Z

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

cs.AI updates on arXiv.org 2025-10-14T04:09:25.000000Z

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

cs.AI updates on arXiv.org 2025-10-14T04:09:25.000000Z

CoT 之后，CoF 如何让帧间逻辑从「隐式对齐」变成「显式思考」？

机器之心 2025-10-13T16:13:05.000000Z

CoT 之后，CoF 如何让帧间逻辑从「隐式对齐」变成「显式思考」？

机器之心 2025-10-13T16:13:05.000000Z

StreamingVLM: Real-Time Understanding for Infinite Video Streams

cs.AI updates on arXiv.org 2025-10-13T04:14:45.000000Z

StreamingVLM: Real-Time Understanding for Infinite Video Streams

cs.AI updates on arXiv.org 2025-10-13T04:14:45.000000Z

RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos

cs.AI updates on arXiv.org 2025-10-13T04:13:41.000000Z

RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos

cs.AI updates on arXiv.org 2025-10-13T04:13:41.000000Z

D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition

cs.AI updates on arXiv.org 2025-10-13T04:13:26.000000Z

Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback

cs.AI updates on arXiv.org 2025-10-06T04:27:13.000000Z

Copyright © 2019 FISHAI.All Rights Reserved