热点
关于我们
xx
xx
"
训练框架
" 相关文章
DINO-MX: A Modular & Flexible Framework for Self-Supervised Learning
cs.AI updates on arXiv.org
2025-11-05T05:30:41.000000Z
When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails
cs.AI updates on arXiv.org
2025-10-27T06:18:21.000000Z
UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning
cs.AI updates on arXiv.org
2025-10-24T04:25:37.000000Z
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models
cs.AI updates on arXiv.org
2025-10-21T04:28:32.000000Z
DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management
cs.AI updates on arXiv.org
2025-10-20T04:11:45.000000Z
CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions
cs.AI updates on arXiv.org
2025-10-17T04:19:14.000000Z
RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning
cs.AI updates on arXiv.org
2025-10-17T04:19:04.000000Z
Multimodal Policy Internalization for Conversational Agents
cs.AI updates on arXiv.org
2025-10-13T04:14:36.000000Z
Multimodal Policy Internalization for Conversational Agents
cs.AI updates on arXiv.org
2025-10-13T04:14:36.000000Z
Multimodal Policy Internalization for Conversational Agents
cs.AI updates on arXiv.org
2025-10-13T04:14:36.000000Z
Localist LLMs -- A Mathematical Framework for Dynamic Locality Control
cs.AI updates on arXiv.org
2025-10-13T04:10:23.000000Z
Learning without Global Backpropagation via Synergistic Information Distillation
cs.AI updates on arXiv.org
2025-10-07T04:13:10.000000Z
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
cs.AI updates on arXiv.org
2025-10-06T04:24:57.000000Z
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
cs.AI updates on arXiv.org
2025-09-30T04:05:37.000000Z
Towards Strategic Persuasion with Language Models
cs.AI updates on arXiv.org
2025-09-30T04:00:44.000000Z
SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning
cs.AI updates on arXiv.org
2025-09-16T05:02:01.000000Z
GraSP: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data for SFT and DPO
cs.AI updates on arXiv.org
2025-08-22T04:02:19.000000Z
Multi-Plasticity Synergy with Adaptive Mechanism Assignment for Training Spiking Neural Networks
cs.AI updates on arXiv.org
2025-08-20T04:17:44.000000Z
WeChat-YATT: A Simple, Scalable and Balanced RLHF Trainer
cs.AI updates on arXiv.org
2025-08-12T04:39:13.000000Z
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
cs.AI updates on arXiv.org
2025-08-05T11:10:23.000000Z