训练框架_Fishai

热点

"训练框架" 相关文章

DINO-MX: A Modular & Flexible Framework for Self-Supervised Learning

cs.AI updates on arXiv.org 2025-11-05T05:30:41.000000Z

When Models Outthink Their Safety: Mitigating Self-Jailbreak in Large Reasoning Models with Chain-of-Guardrails

cs.AI updates on arXiv.org 2025-10-27T06:18:21.000000Z

UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning

cs.AI updates on arXiv.org 2025-10-24T04:25:37.000000Z

MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models

cs.AI updates on arXiv.org 2025-10-21T04:28:32.000000Z

DMRetriever: A Family of Models for Improved Text Retrieval in Disaster Management

cs.AI updates on arXiv.org 2025-10-20T04:11:45.000000Z

CBF-RL: Safety Filtering Reinforcement Learning in Training with Control Barrier Functions

cs.AI updates on arXiv.org 2025-10-17T04:19:14.000000Z

RL-100: Performant Robotic Manipulation with Real-World Reinforcement Learning

cs.AI updates on arXiv.org 2025-10-17T04:19:04.000000Z

Multimodal Policy Internalization for Conversational Agents

cs.AI updates on arXiv.org 2025-10-13T04:14:36.000000Z

Multimodal Policy Internalization for Conversational Agents

cs.AI updates on arXiv.org 2025-10-13T04:14:36.000000Z

Multimodal Policy Internalization for Conversational Agents

cs.AI updates on arXiv.org 2025-10-13T04:14:36.000000Z

Localist LLMs -- A Mathematical Framework for Dynamic Locality Control

cs.AI updates on arXiv.org 2025-10-13T04:10:23.000000Z

Learning without Global Backpropagation via Synergistic Information Distillation

cs.AI updates on arXiv.org 2025-10-07T04:13:10.000000Z

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

cs.AI updates on arXiv.org 2025-10-06T04:24:57.000000Z

Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation

cs.AI updates on arXiv.org 2025-09-30T04:05:37.000000Z

Towards Strategic Persuasion with Language Models

cs.AI updates on arXiv.org 2025-09-30T04:00:44.000000Z

SABR: A Stable Adaptive Bitrate Framework Using Behavior Cloning Pretraining and Reinforcement Learning Fine-Tuning

cs.AI updates on arXiv.org 2025-09-16T05:02:01.000000Z

GraSP: A Unified Graph-Based Framework for Scalable Generation, Quality Tagging, and Management of Synthetic Data for SFT and DPO

cs.AI updates on arXiv.org 2025-08-22T04:02:19.000000Z

Multi-Plasticity Synergy with Adaptive Mechanism Assignment for Training Spiking Neural Networks

cs.AI updates on arXiv.org 2025-08-20T04:17:44.000000Z

WeChat-YATT: A Simple, Scalable and Balanced RLHF Trainer

cs.AI updates on arXiv.org 2025-08-12T04:39:13.000000Z

VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo

cs.AI updates on arXiv.org 2025-08-05T11:10:23.000000Z

Copyright © 2019 FISHAI.All Rights Reserved