训练动态_Fishai

热点

"训练动态" 相关文章

On the Emergence of Induction Heads for In-Context Learning

cs.AI updates on arXiv.org 2025-11-05T05:14:45.000000Z

The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis

cs.AI updates on arXiv.org 2025-10-21T04:28:30.000000Z

Early-stopping for Transformer model training

cs.AI updates on arXiv.org 2025-10-21T04:20:00.000000Z

Revisiting Meta-Learning with Noisy Labels: Reweighting Dynamics and Theoretical Guarantees

cs.AI updates on arXiv.org 2025-10-15T04:58:27.000000Z

Stability of Transformers under Layer Normalization

cs.AI updates on arXiv.org 2025-10-14T04:16:25.000000Z

What Scales in Cross-Entropy Scaling Law?

cs.AI updates on arXiv.org 2025-10-07T04:16:18.000000Z

Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent

cs.AI updates on arXiv.org 2025-08-12T04:39:22.000000Z

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $\mu$P Parametrization

cs.AI updates on arXiv.org 2025-07-23T04:03:41.000000Z

Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization

cs.AI updates on arXiv.org 2025-07-08T05:54:02.000000Z

大模型已过时、小模型SLM才是未来？苹果正在研究这个

智源社区 2024-11-01T14:54:03.000000Z

Copyright © 2019 FISHAI.All Rights Reserved