热点
关于我们
xx
xx
"
视频理解
" 相关文章
北大字节开源首个时空推理视频模型!思考过程全透明,性能超越GPT-4o
量子位
2025-11-05T09:58:45.000000Z
北大字节开源首个时空推理视频模型,思考过程全透明,性能超越GPT-4o
36kr-科技
2025-11-05T09:13:38.000000Z
Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding?
machinelearning apple
2025-10-27T17:00:29.000000Z
StreamingTOM: Streaming Token Compression for Efficient Video Understanding
cs.AI updates on arXiv.org
2025-10-22T04:21:28.000000Z
告别「偏科」,UniVid实现视频理解与生成一体化
机器之心
2025-10-21T05:09:39.000000Z
MT-Video-Bench: A Holistic Video Understanding Benchmark for Evaluating Multimodal LLMs in Multi-Turn Dialogues
cs.AI updates on arXiv.org
2025-10-21T04:28:51.000000Z
Xiaoice: Training-Free Video Understanding via Self-Supervised Spatio-Temporal Clustering of Semantic Features
cs.AI updates on arXiv.org
2025-10-21T04:27:10.000000Z
苹果将在 ICCV 2025 展示多项前沿视觉研究成果
oschina.net
2025-10-14T07:01:27.000000Z
苹果将在 ICCV 2025 展示多项前沿视觉研究成果
oschina.net
2025-10-14T07:01:27.000000Z
video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
cs.AI updates on arXiv.org
2025-10-14T04:19:50.000000Z
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
cs.AI updates on arXiv.org
2025-10-14T04:09:25.000000Z
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
cs.AI updates on arXiv.org
2025-10-14T04:09:25.000000Z
CoT 之后,CoF 如何让帧间逻辑从「隐式对齐」变成「显式思考」?
机器之心
2025-10-13T16:13:05.000000Z
CoT 之后,CoF 如何让帧间逻辑从「隐式对齐」变成「显式思考」?
机器之心
2025-10-13T16:13:05.000000Z
StreamingVLM: Real-Time Understanding for Infinite Video Streams
cs.AI updates on arXiv.org
2025-10-13T04:14:45.000000Z
StreamingVLM: Real-Time Understanding for Infinite Video Streams
cs.AI updates on arXiv.org
2025-10-13T04:14:45.000000Z
RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos
cs.AI updates on arXiv.org
2025-10-13T04:13:41.000000Z
RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos
cs.AI updates on arXiv.org
2025-10-13T04:13:41.000000Z
D-CoDe: Scaling Image-Pretrained VLMs to Video via Dynamic Compression and Question Decomposition
cs.AI updates on arXiv.org
2025-10-13T04:13:26.000000Z
Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback
cs.AI updates on arXiv.org
2025-10-06T04:27:13.000000Z