热点
"大规模模型训练" 相关文章
Fine-tune FLAN-T5 XL/XXL using DeepSpeed and Hugging Face Transformers
philschmid RSS feed 2025-09-30T11:12:50.000000Z
In awe at the scale of these tensors – a gentle introduction to Unit-Scaled Maximal Update Parametrization
Aleph alpha 2025-09-28T15:41:42.000000Z
Dion: the distributed orthonormal update revolution is here
智源社区 2025-08-12T21:41:55.000000Z
Reduce ML training costs with Amazon SageMaker HyperPod
AWS Machine Learning Blog 2025-04-10T20:12:18.000000Z
Unveiling Critical Batch Size Dynamics: How Data and Model Scaling Impact Efficiency in Large-Scale Language Model Training with Innovative Optimization Techniques
MarkTechPost@AI 2024-11-25T22:04:47.000000Z