VentureBeat 11月13日 03:36
微搏发布VibeThinker-1.5B模型,以低成本实现高性能推理
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

中国社交媒体公司微搏(Weibo)发布了其开源的大型语言模型VibeThinker-1.5B。该模型基于阿里巴巴的Qwen2.5-Math-1.5B进行微调,拥有15亿参数,并在数学和代码任务上展现出卓越的推理能力,甚至超越了参数量大得多的模型。VibeThinker-1.5B采用创新的Spectrum-to-Signal(SSP)训练框架,将监督微调(SFT)和强化学习(RL)分离,以最大化答案多样性并优化正确路径,从而在较低的计算成本下实现高性能。该模型已在Hugging Face、GitHub等平台免费提供,为企业在边缘设备部署和构建低成本推理系统提供了新的可能性。

💡 **VibeThinker-1.5B的突破性性能与成本效益**:尽管只有15亿参数,VibeThinker-1.5B在数学和代码推理任务上表现出色,甚至超越了参数量达数百倍的模型,如DeepSeek的R1(6710亿参数)。更令人瞩目的是,其训练成本极低,仅为7800美元的计算资源费用,这颠覆了业界对高性能模型需要巨额投入的认知,为AI研发的成本效益树立了新标杆。

🚀 **创新的Spectrum-to-Signal (SSP)训练框架**:VibeThinker-1.5B的卓越表现归功于其独特的SSP训练框架。该框架将监督微调(SFT)和强化学习(RL)阶段进行分离,SFT阶段侧重于最大化潜在正确答案的多样性(提高Pass@K分数),而RL阶段则通过MaxEnt-Guided Policy Optimization(MGPO)来识别和放大最正确的路径。这种方法使得模型能够更有效地探索推理空间,实现信号放大,而无需依赖庞大的参数量。

⚙️ **企业级应用的潜力与优势**:VibeThinker-1.5B的紧凑尺寸使其能够部署在边缘设备上,如手机和车载系统,同时推理成本比大型模型低20-70倍。这为企业提供了构建成本效益高、可本地部署的推理系统的可行方案,降低了对大型模型API的依赖,并有望在自动化工作流、推理代理部署等方面发挥重要作用。

📊 **跨领域推理能力与局限性**:在数学和代码等结构化推理任务上,VibeThinker-1.5B表现优异,甚至超越了许多更大规模的模型。然而,在通用知识推理(如GPQA)方面,它仍落后于大型模型,显示出在特定任务上的优化可能伴随着通用知识广度的权衡,这对于企业选择模型时需考虑其具体应用场景。

Another day in late 2025, another impressive result from a Chinese company in open source artificial intelligence.

Chinese social networking company Weibo's AI division recently released its open source VibeThinker-1.5B—a 1.5 billion parameter large language model (LLM) that is a fine-tuned variant of rival Chinese tech firm Alibaba's Qwen2.5-Math-1.5B.

It's available now for free download and usage by researchers and enterprise developers—even for commercial purposes—under a permissive MIT License on Hugging Face, GitHub and ModelScope, with a technical report on open access science publishing site arxiv.org.

And yet, despite its compact size, VibeThinker-1.5B achieves benchmark-topping reasoning performance on math and code tasks, rivaling or surpassing models hundreds of times its size, even outperforming Chinese rival DeepSeek's famed R1 that went viral at the start of this year—a 671-billion parameter model—on formal reasoning benchmark.

It further eclipses Mistral AI's Magistral Medium and holds its own against Anthropic's Claude Opus 4 and OpenAI's gpt-oss-20B Medium, all while requiring a fraction of the infrastructure and investment.

It also does so having been post-trained on a budget of merely $7800 USD for compute resources (3900 GPU hours on Nvidia H800s) — far less than the tens, or even hundreds, of thousands of dollars typically required to fine-tune models of similar or larger scale.

Recall this is not the total cost of the model's development, however: LLMs are trained in stages. First comes pre-training, when the model learns basic language structure and general knowledge by predicting the next word across enormous amounts of text from the internet, books, and articles. This gives it fluency but not much sense of how to follow instructions or hold a conversation

Post-training comes next, using much smaller, higher-quality datasets—typically collections of example questions, prompts, and expert-written answers—to teach the model how to respond helpfully, reason through problems, and align with human expectations. Still, Weibo's post-training cost effectiveness on VibeThinker-1.5B is noteworthy and should be commended.

The open-source release upends assumptions about parameter scale, compute intensity, and the minimum viable size for high-performance LLMs.

A Different Training Approach: Spectrum-to-Signal

VibeThinker-1.5B owes its performance not to scale, but to the training framework behind it: the Spectrum-to-Signal Principle (SSP).

Instead of optimizing a model purely for single-answer correctness (Pass@1), the SSP framework decouples supervised fine-tuning (SFT) and reinforcement learning (RL) into two distinct phases with different goals:

The authors argue this separation allows small models to explore reasoning space more effectively—achieving signal amplification without relying on massive parameter counts.

VibeThinker-1.5B makes a compelling case that the industry’s reliance on parameter scaling as the only route to better reasoning performance may be outdated.

By adopting a diversity-first training pipeline, WeiboAI has shown that smaller, more accessible models can match and even outperform billion-dollar systems in logic-heavy tasks.

The low resource footprint is among the most significant aspects of VibeThinker-1.5B. At under $8,000, the post-training cost is 30–60x lower than models like DeepSeek R1 and MiniMax-M1, which cost between $294K and $535K to train.

Performance Across Domains

Despite its small size, VibeThinker-1.5B delivers cross-domain reasoning that outpaces many larger open-source and commercial models:

Model

AIME25

LiveCodeBench v6

GPQA-Diamond

VibeThinker-1.5B

74.4

51.1

46.7

GPT-OSS-20B-Medium

72.1

54.9

66.0

Claude Opus 4

69.2

56.6

79.6

MiniMax M1 (456B)

74.6

62.3

69.2

DeepSeek R1 (671B)

70.0

65.9

71.5

Kimi K2 (1.09T)

49.5

53.7

75.1

VibeThinker was benchmarked against both reasoning-centric models (Magistral, Claude, OpenAI o3-mini) and non-reasoning LLMs (GPT-4.1, Kimi K2, DeepSeek V3). Across structured reasoning benchmarks, the model consistently outperformed non-reasoning models, regardless of size:

This supports the authors’ claim that size is not the only path to reasoning capability—with proper training design, smaller models can reach or even exceed the performance of far larger systems in targeted tasks.

Notably, it achieves parity with models hundreds of times larger on math and code, though it lags behind in general knowledge reasoning (GPQA), where larger models maintain an edge.

This suggests a potential specialization trade-off: while VibeThinker excels at structured logical tasks, it has less capacity for wide-ranging encyclopedic recall, a known limitation of smaller architectures.

Guidance for Enterprise Adoption

The release includes recommended inference settings (temperature = 0.6, top_p = 0.95, max tokens = 40960).

The model is small enough to be deployed on edge devices, including mobile phones and vehicle-embedded systems, while inference costs are estimated to be 20–70x cheaper than with large models.

This positions VibeThinker-1.5B not just as a research achievement, but as a potential foundation for cost-efficient, locally deployable reasoning systems.

Weibo’s Strategy and Market Position

Weibo, launched by Sina Corporation in 2009, remains a cornerstone of China’s social media ecosystem. Often described as China’s version of X (formerly Twitter), the platform blends microblogging, multimedia content, and trending-topic features with a regulatory environment shaped by tight government oversight.

Despite counting 600 million monthly active users (more than twice that of X), investors are not optimistic about its advertising revenue growth potential in the near term, and Weibo is navigating intensifying competition from video-first platforms like Douyin, which are drawing younger users and increasing time-spent elsewhere.

In response, Weibo has leaned into creator-economy monetization, live-streaming, and vertical video—adding tools for influencer engagement, e-commerce integration, and richer analytics for brands.

The platform’s role as a digital public square also makes it a focus of regulatory scrutiny. Chinese authorities continue to apply pressure on issues ranging from content governance to data security. In September 2025, Weibo was among the platforms cited in official warnings, highlighting its ongoing exposure to policy risks.

Weibo’s push into AI R&D—exemplified by the release of VibeThinker-1.5B—signals a shift in ambition. Beyond being a media platform, Weibo is positioning itself as a player in the next phase of Chinese AI development, using its capital reserves, user behavior data, and in-house research capacity to pursue adjacent technical domains.

What It Means for Enterprise Technical Decision Makers

For engineering leaders and enterprise AI teams, VibeThinker’s release has practical implications for everything from orchestration pipelines to cost modeling.

A 1.5B-parameter model that outperforms 100x larger models on math and programming tasks doesn’t just save compute—it shifts the architectural balance. It enables LLM inference on constrained infrastructure, reduces latency at the edge, and lowers the barrier to entry for applications that otherwise would have required API access to closed, frontier-scale models.

That matters for enterprise ML leads trying to deploy reasoning-capable agents within existing systems, or for platform owners tasked with integrating LLMs into automated workflows.

It also speaks to those running reinforcement learning from human feedback (RLHF) pipelines or managing inference optimization across hybrid cloud environments.

The model’s post-training methodology—particularly its entropy-targeted reinforcement learning approach—offers a roadmap for teams looking to refine smaller checkpoints instead of relying on large-scale pretraining.

VibeThinker’s benchmark transparency and data decontamination steps also address another emerging priority in enterprise AI: auditability. While its performance on general-knowledge tests still trails large frontier models, its task-specific reliability makes it an attractive candidate for controlled environments where correctness matters more than coverage.

In short, VibeThinker-1.5B isn’t just a research milestone—it’s a strong candidate for practical enterprise use, deployment and learnings. It suggests that a new class of compact, reasoning-optimized models is viable for enterprise use cases that were previously the domain of far larger systems. For organizations trying to balance cost, latency, interpretability, and control, it’s a good new option to the long, growing list of Chinese open source offerings.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

VibeThinker-1.5B Weibo Open Source AI Large Language Model LLM AI Reasoning Machine Learning Cost-Effective AI Spectrum-to-Signal SSP Edge AI 微搏 开源人工智能 大型语言模型 AI推理 机器学习 低成本AI 边缘AI
相关文章