Co$^4$机器学习效率超越GPT-2和GPT-BERT

cs.AI updates on arXiv.org 10月10日 12:18

Co$^4$机器学习效率超越GPT-2和GPT-BERT

本文介绍了Co$^4$机器学习模型在效率上超越GPT-2和GPT-BERT，并展示了其在SuperGLUE任务上的优异表现。

arXiv:2510.08404v1 Announce Type: cross Abstract: We show that a tiny Co$^4$ machine(Adeel,2025) with a single layer, two heads, and 8M parameters, operating at an approximate cost of $O(N)$ (where $N$ is the number of input tokens), outpaces the BabyLM Challenge baselines GPT-2 (124M, 12 layers, $O(N^2))$ and GPT-BERT (30M, 12 layers, $O(N^2))$ in just two epochs, while both are trained for ten. Co$^4$ achieves orders-of-magnitude greater training efficiency on 10M tokens, demonstrating highly sample efficient pretraining. Using the BabyLM challenge evaluation pipeline across complex benchmarks, Co$^4$ exhibits strong zero-shot and fine-tuning performance on SuperGLUE tasks. Specifically, Co$^4$ outperforms GPT-2 on 5 out of 7 zero-shot metrics and 6 out of 7 fine-tuning tasks, and GPT-BERT on 4 out of 7 metrics in both cases. These results suggest the need to rethink prevailing deep learning paradigms and associated scaling laws.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Co$^4$ 机器学习效率 GPT-2 GPT-BERT

相关文章

How bad a future do ML researchers expect?

Accelerating ML application development: Production-ready Airflow integrations with critical AI tools

Weka Makes Life Simpler for Developers, Engineers, and Architects

PostgresML: Streamlining AI Model Deployment With PostgreSQL Integration

Harmonizing AI: Crafting Personalized Song Suggestions

Analyzing the Impact of Flash Attention on Numeric Deviation and Training Stability in Large-Scale Machine Learning Models

Learn AI Together — Towards AI Community Newsletter #23

Top Important LLM Papers for the Week from 29/04 to 05/05

K-Means From Scratch: How The Cluster Magic Works

Unpacking and Utilizing Vertex with Google Earth Engine for Machine Learning.