钛媒体:引领未来商业与生活新知 10月21日 16:46
中国AI视频模型Vidu Q2发布,挑战OpenAI Sora和谷歌Veo
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

中国初创公司生数科技发布了其最新的AI视频生成模型Vidu Q2,旨在与OpenAI的Sora 2和谷歌的Veo 3.1竞争。Vidu Q2在一致性、叙事控制和创意灵活性方面实现了显著提升,允许用户上传并融合最多七张参考图像到单个视频中,并能保持这些视觉元素的独特特征。CEO罗毅航表示,Vidu Q2标志着AI视频创作的新篇章,目标是让AI不仅能创作视频,还能与人类创作者一起表演、反应和讲述故事。Vidu Q2还引入了类似谷歌Veo 3.1的过渡动画功能,并发布了API接口,方便企业和工作室集成。生数科技强调,Vidu Q2在视觉质量上可与Sora 2和Veo 3.1媲美,同时具有更快的速度和更低的成本,已积累了3000万用户,生成了超过4亿个视频。

🚀 生数科技发布Vidu Q2,一款新的AI视频生成模型,旨在直接挑战行业领先者OpenAI的Sora 2和谷歌的Veo 3.1,标志着中国在多模态生成AI领域的快速发展。

🖼️ Vidu Q2引入“多实体一致性”功能,允许用户上传最多七张参考图像(包括人脸、场景或道具),并将它们无缝融合到单个视频中,同时保持每个元素的独特特征,有效减少了现有模型中常见的失真和融合错误。

🎬 该模型支持类似谷歌Veo 3.1的过渡动画,用户只需上传场景的首尾帧,Vidu Q2即可生成中间的连贯动作,从而增强了对叙事流程和节奏的控制,这在电影和广告制作中尤为重要。

💰 生数科技强调,Vidu Q2在提供与Sora 2和Veo 3.1相当的视觉质量的同时,具有更快的生成速度和更低的运营成本,这得益于其本地化的基础设施和优化的压缩算法,有望使高质量的生成视频创作更普及。

🌍 自2023年3月成立以来,生数科技发展迅速,其Vidu系列模型已积累了全球超过200个国家和地区的3000万用户,并生成了超过4亿个视频,显示出其在AI视频生成领域的强大潜力。

AI-generated image

TMTPOST -- ShengShu Technology, one of China’s fastest-growing multimodal generative artificial intelligence startups, has unveiled a new version of its AI video generation model aimed squarely at challenging OpenAI’s Sora 2 and Google’s Veo 3.1, two of the world’s most advanced text-to-video systems.

The Beijing-based firm said on Tuesday that its new release, Vidu Q2, significantly improves consistency, narrative control, and creative flexibility, marking a step forward in the company’s ambition to compete globally in the emerging field of AI-driven video creation.

According to ShengShu, Vidu Q2 allows creators to upload and merge up to seven reference images—covering faces, scenes, or props—into a single coherent video. The model’s new “multi-entity consistency” feature blends these visual elements with text prompts while maintaining the unique characteristics of each reference, reducing the distortions and blending errors that often appear in existing models.

“Vidu Q2 marks a new chapter in AI video creation,” said Luo Yihang, ShengShu’s chief executive officer, during the product announcement. “We’re entering an era where AI doesn’t just create videos but acts, reacts, and tells stories alongside human creators. This launch goes beyond simple generation—it’s about teaching AI to perform and express emotion.”

Luo said the company’s goal is not to replace human creativity but to expand it. “With each release, we bring technology and imagination closer together,” he said. “Our aim is to make creativity more accessible—turning imagination into visible, emotional storytelling.”

The Vidu Q2 model introduces several new features that position it directly against Western rivals. Like Google’s Veo 3.1, Vidu Q2 supports transition animations that allow users to upload only the first and last frames of a scene, letting the model generate the in-between motion. This offers creators enhanced control over narrative flow and pacing—a capability particularly valued in film and advertising production.

The company also released a Vidu Q2 application programming interface (API), allowing enterprises and studios to integrate the model into their workflows for automated or customized content generation.

ShengShu emphasized that its new system delivers comparable visual quality to Sora 2 and Veo 3.1 at a faster speed and lower cost, potentially making high-quality generative video creation more accessible to independent creators and small businesses.

Industry insiders told Yicai Global that the pricing advantage could prove decisive. While U.S.-based models require extensive cloud resources and expensive compute credits, ShengShu’s localized infrastructure and optimized compression algorithms make Vidu Q2 considerably cheaper to operate.

In one scenario, Vidu Q2 was prompted to generate a video depicting a blade battery module moving on a conveyor belt inside a Chinese electric vehicle factory, being scanned by a Siasun yellow industrial robot, with a digital screen showing “99.92” in simplified Chinese characters.

The system successfully fused all visual elements—the battery, robotic arm, Siasun logo, and Chinese text—into a smooth, stable sequence. Observers said the video maintained high visual fidelity, especially in rendering Chinese characters accurately, demonstrating the strength of the multi-entity consistency feature.

In comparison, Google’s Veo 3.1, which supports up to three reference images, failed to reproduce the Chinese text correctly. OpenAI’s Sora 2 handled the text accurately but mistakenly changed the Siasun logo to that of Nissan Motor, showing the difficulty of managing multiple distinct references across frames.

A second test involved a short dialogue scene: a Chinese chairman angrily asking, “The battery caught fire, are you messing with me?” followed by an American CEO replying in English, “Not me, it’s them,” in a Shanghai boardroom setting.

Vidu Q2 generated the scene using reference images for the characters’ expressions. The video demonstrated accurate lip synchronization in both languages and convincing facial animation for anger and frustration. However, the emotional tone of the accompanying audio was relatively flat, lagging behind the natural expressiveness achieved by Veo 3.1.

Despite that, analysts said the results highlight ShengShu’s progress in cross-lingual emotional modeling and multimodal consistency—areas considered technically challenging even for global leaders.

Founded in March 2023 by researchers from Tsinghua University’s Institute for AI Industry Research, ShengShu has quickly risen to prominence in China’s fast-evolving generative AI industry. The startup launched Vidu 1.0 in April 2024 and has since accumulated 30 million users across more than 200 countries and regions, generating over 400 million videos to date.

Vidu’s early versions could produce five- to eight-second clips at 1080p resolution from text or image prompts in either Chinese or English. The Q2 update builds on that base with improved realism, narrative capability, and expanded creative control.

Analysts say the company’s trajectory mirrors China’s broader push to narrow the technological gap with U.S. AI developers. “China’s AI ecosystem is catching up fast,” said an industry expert at a Beijing venture capital firm. “ShengShu’s focus on multimodal integration—especially with localized features like Chinese text and cultural nuances—gives it an edge in domestic and Asian markets.”

Generative video has become one of the most competitive frontiers in AI development. Since OpenAI’s Sora first stunned the industry with its photorealistic videos in early 2024, companies worldwide have raced to build their own models capable of producing complex, cinematic sequences directly from text prompts.

Google’s Veo 3.1 and Anthropic’s experimental systems have set the bar high for quality and consistency, but Chinese startups such as ShengShu, Kuaishou’s Kolors, and Tencent’s Hunyuan Video are rapidly improving.

“The next phase of competition is not just about realism,” said Luo. “It’s about emotional intelligence—how well AI can understand and express human feelings through visual storytelling.”

With Vidu Q2, ShengShu aims to establish itself as a major global player in AI video, blending scientific precision with artistic expression. Luo summed it up: “We want to make imagination visible. This is where technology and emotion finally meet.”

更多精彩内容,关注钛媒体微信号(ID:taimeiti),或者下载钛媒体App

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Vidu Q2 AI视频生成 Sora 2 Veo 3.1 生数科技
相关文章