热点
关于我们
xx
xx
"
训练动态
" 相关文章
On the Emergence of Induction Heads for In-Context Learning
cs.AI updates on arXiv.org
2025-11-05T05:14:45.000000Z
The Graphon Limit Hypothesis: Understanding Neural Network Pruning via Infinite Width Analysis
cs.AI updates on arXiv.org
2025-10-21T04:28:30.000000Z
Early-stopping for Transformer model training
cs.AI updates on arXiv.org
2025-10-21T04:20:00.000000Z
Revisiting Meta-Learning with Noisy Labels: Reweighting Dynamics and Theoretical Guarantees
cs.AI updates on arXiv.org
2025-10-15T04:58:27.000000Z
Stability of Transformers under Layer Normalization
cs.AI updates on arXiv.org
2025-10-14T04:16:25.000000Z
What Scales in Cross-Entropy Scaling Law?
cs.AI updates on arXiv.org
2025-10-07T04:16:18.000000Z
Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
cs.AI updates on arXiv.org
2025-08-12T04:39:22.000000Z
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $\mu$P Parametrization
cs.AI updates on arXiv.org
2025-07-23T04:03:41.000000Z
Improving LLM Reasoning for Vulnerability Detection via Group Relative Policy Optimization
cs.AI updates on arXiv.org
2025-07-08T05:54:02.000000Z
大模型已过时、小模型SLM才是未来?苹果正在研究这个
智源社区
2024-11-01T14:54:03.000000Z