热点
"优化方法" 相关文章
Linear Causal Discovery with Interventional Constraints
cs.AI updates on arXiv.org 2025-10-31T04:07:28.000000Z
Metadata-Driven Retrieval-Augmented Generation for Financial Question Answering
cs.AI updates on arXiv.org 2025-10-29T04:28:05.000000Z
Discovering the curriculum with AI: A proof-of-concept demonstration with an intelligent tutoring system for teaching project selection
cs.AI updates on arXiv.org 2025-10-22T04:26:16.000000Z
Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning
cs.AI updates on arXiv.org 2025-10-16T04:26:51.000000Z
Dual Perspectives on Non-Contrastive Self-Supervised Learning
cs.AI updates on arXiv.org 2025-10-15T05:13:46.000000Z
Margin Adaptive DPO: Leveraging Reward Model for Granular Control in Preference Optimization
cs.AI updates on arXiv.org 2025-10-08T04:10:49.000000Z
Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards
cs.AI updates on arXiv.org 2025-10-07T04:17:00.000000Z
SPOGW: a Score-based Preference Optimization method via Group-Wise comparison for workflows
cs.AI updates on arXiv.org 2025-10-07T04:07:40.000000Z
DRIFT: Learning from Abundant User Dissatisfaction in Real-World Preference Learning
cs.AI updates on arXiv.org 2025-10-06T04:23:06.000000Z
Consolidating Reinforcement Learning for Multimodal Discrete Diffusion Models
cs.AI updates on arXiv.org 2025-10-06T04:20:23.000000Z
Rethinking KL Regularization in RLHF: From Value Estimation to Gradient Optimization
cs.AI updates on arXiv.org 2025-10-03T04:15:24.000000Z
Rethinking RoPE Scaling in Quantized LLM: Theory, Outlier, and Channel-Band Analysis with Weight Rescaling
cs.AI updates on arXiv.org 2025-10-02T04:16:43.000000Z
Spectral Logit Sculpting: Adaptive Low-Rank Logit Transformation for Controlled Text Generation
cs.AI updates on arXiv.org 2025-10-01T05:59:43.000000Z
[macOS] Mac 端钉钉 CPU 降下来的办法
V2EX 2025-09-30T08:33:08.000000Z
Thinking Machines Lab 提出“模块化流形”方法优化权重矩阵
oschina.net 2025-09-30T03:57:23.000000Z
POLO: Preference-Guided Multi-Turn Reinforcement Learning for Lead Optimization
cs.AI updates on arXiv.org 2025-09-29T04:14:58.000000Z
Multi-Objective Reinforcement Learning for Large Language Model Optimization: Visionary Perspective
cs.AI updates on arXiv.org 2025-09-29T04:14:42.000000Z
Reinforcement Learning on Pre-Training Data
cs.AI updates on arXiv.org 2025-09-26T04:24:09.000000Z
Are We Scaling the Right Thing? A System Perspective on Test-Time Scaling
cs.AI updates on arXiv.org 2025-09-25T05:45:57.000000Z
Analysis of approximate linear programming solution to Markov decision problem with log barrier function
cs.AI updates on arXiv.org 2025-09-25T05:06:50.000000Z