本期的 15 篇论文如下:
00:21 🛡 OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows(OS-Sentinel:在真实工作流中通过混合验证提升移动GUI代理安全性)
01:13 🧠 ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning(ThinkMorph:多模态交错思维链中的涌现特性)
01:49 ⚔ INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats(INT对决FP:细粒度低比特量化格式的综合研究)
02:38 🤖 $π_\texttt{RL}$: Online RL Fine-tuning for Flow-based Vision-Language-Action Models(π_RL:面向流式视觉-语言-动作模型的在线强化学习微调)
03:26 🚀 Continuous Autoregressive Language Models(连续自回归语言模型)
03:54 🧭 Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning(Spatial-SSRL:通过自监督强化学习增强空间理解)
04:37 🎯 HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration(HyperClick:通过不确定性校准推动可靠GUI定位)
05:15 🎯 Defeating the Training-Inference Mismatch via FP16(用FP16打败训练-推理失配)
05:52 🪜 Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals(分阶段DMD:在子区间内做分数匹配实现少步分布匹配蒸馏)
06:28 🧭 Revisiting Multimodal Positional Encoding in Vision-Language Models(再探视觉-语言模型中的多模态位置编码)
07:09 ⚡ Higher-order Linear Attention(高阶线性注意力机制)
07:55 🌐 Dual-Stream Diffusion for World-Model Augmented Vision-Language-Action Model(双流扩散助力世界模型增强视觉-语言-动作模型)
08:36 🔬 The Denario project: Deep knowledge AI agents for scientific discovery(Denario项目:面向科学发现的深度知识AI智能体)
09:14 🎯 Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning(面向具身决策的多模态大模型视觉后门攻击:对比触发学习方法)
09:51 🏙 Mask-to-Height: A YOLOv11-Based Architecture for Joint Building Instance Segmentation and Height Classification from Satellite Imagery(Mask-to-Height:基于YOLOv11的联合建筑实例分割与高度分类架构)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
