本期的 14 篇论文如下:
00:20 🖥 D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI(D2E:利用桌面数据规模化视觉-动作预训练以迁移至具身智能)
01:13 📷 Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation(基于相机的统一多模态理解与生成模型)
01:56 🎨 TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling(TAG:抑制幻觉的扩散采样切向放大引导)
02:31 🧠 Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs(多模态提示优化:为何不为多模态大模型释放全模态潜能)
03:05 🚀 AutoPR: Let's Automate Your Academic Promotion!(AutoPR:让学术晋升一键自动化!)
03:39 🧭 R-Horizon: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?(R-HORIZON:你的大推理模型在广度与深度上究竟能走多远?)
04:14 🚀 Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels(Webscale-RL:把强化学习数据扩展到预训练体量的自动化流水线)
04:56 🛰 SpaceVista: All-Scale Visual Spatial Reasoning from mm to km(SpaceVista:毫米到千米全尺度视觉空间推理)
05:37 🎥 StreamingVLM: Real-Time Understanding for Infinite Video Streams(StreamingVLM:面向无限视频流的实时理解框架)
06:19 🌐 KORMo: Korean Open Reasoning Model for Everyone(KORMo:人人可用的韩语开放推理模型)
06:42 ♻ Don't Waste Mistakes: Leveraging Negative RL-Groups via Confidence Reweighting(别浪费错误:通过置信度加权利用负RL组)
07:25 🧠 Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization(从推理到学习的桥梁:以复杂度分布外泛化揭穿幻觉)
08:16 ⚡ DISCO: Diversifying Sample Condensation for Efficient Model Evaluation(DISCO:以模型分歧为导向的样本浓缩加速评测)
08:56 🚗 Progressive Gaussian Transformer with Anisotropy-aware Sampling for Open Vocabulary Occupancy Prediction(面向开放词汇占用预测的各向异性采样渐进高斯Transformer)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
