热点
关于我们
xx
xx
"
泛化损失
" 相关文章
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
cs.AI updates on arXiv.org
2025-10-03T04:16:07.000000Z