Recursal AI development blog 02月23日
Featherless.ai introduces Qwerky-72B: The Best Post-Transformer model yet
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Featherless.ai推出Qwerky-72B,这是结合线性变压器和注意力机制的混合模型,降低GPU计算成本,提高效率。还介绍了Real-time Voice和Private Cloud解决方案,旨在让AI更普及。

🦾Qwerky-72B是革命性混合模型,降低GPU计算成本

🎤Real-time Voice是超低延迟AI语音解决方案

☁️Private Cloud为组织提供完全控制AI部署的方案

🌐Featherless.ai目标是让所有AI模型可进行无服务器推理

At Featherless.ai , we’re thrilled to announce the launch of Qwerky-72B , a revolutionary hybrid model combining the computational efficiency of linear transformers with the precision of attention mechanisms. This breakthrough architecture reduces GPU compute costs by over 50% compared to traditional transformers, making it one of the most cost-effective large language models available today, costing less than $100k to build.

Qwerky-72B sets a new standard for scalability and accessibility, enabling real-time applications across industries.

Why Qwerky-72B Matters

The introduction of Qwerky-72B marks a pivotal moment in AI development. By merging the strengths of linear transformers and attention transformers, Qwerky-72B achieves unparalleled efficiency without sacrificing performance.

Traditional transformer-based models require ~100GB of VRAM (excluding model weights) to handle a single 72B parameter model at a 16k context length. For each additional concurrent request, the VRAM demand increases significantly due to the attention mechanism's quadratic scaling with sequence length.

In contrast, Qwerky-72B achieves remarkable efficiency by leveraging its hybrid linear-transformer architecture. At 72B parameters, Qwerky-72B requires just 1GB of additional VRAM per request (excluding model weights). Regardless of context length. These innovations unlocks several critical benefits:

This launch aligns with our mission to make AI accessible to everyone regardless of language or nation. Qwerky-72B runs at a fraction of the inference cost of current models, especially at larger context length, by merging the computational efficiency of linear transformers with the precision of attention mechanisms. This is a key multiplier unlock for not only making AI accessible for the world but for the recent test-time compute style models.

Real-time Voice: Instant, Natural AI Conversations

Alongside Qwerky-72B, we’re proudly introducing Real-time Voice, an ultra-low latency AI speech solution designed for seamless human-computer interaction. Built on our serverless infrastructure, it delivers fast speech processing and generation time, minimizing delays for a more natural conversation experience. This allows individuals and businesses to build interactive voice applications that respond instantly and accurately. Whether for virtual assistants, global call centers, or interactive voice response systems, Real-time Voice ensures fast, reliable, and cost-efficient AI-powered speech applications.

Private-cloud beta

Featherless.ai now offers a Private Cloud solution for organizations that need full control over their AI deployments. Our dedicated, secure environments allow businesses to run open models with the ease of serverless infrastructure, zero maintenance, pay-per-use pricing, and complete data sovereignty. With Private Cloud, sensitive data stays protected while maintaining the scalability and flexibility required for modern AI applications. Whether you're an enterprise prioritizing compliance and security or a developer needing custom AI deployment, Featherless.ai’s Private Cloud delivers seamless, cost-efficient AI hosting with full control over where and how your data is processed.

We invite you to experience the future of AI with Qwerky-72B, our real-time voice capabilities and private cloud solutions. Together, let’s make AI more accessible to everyone regardless of language or nation. Visit our website to access the model via our API or download the model directly from HuggingFace:

Featherless.ai

Featherless.ai is a serverless inference platform. Our goal is to make all AI models available for serverless inference. We provide inference via API to a continually expanding library of open-weight models, including the most popular models for role-playing, creative writing, coding assistance, and more. Our solutions enable enterprises and individuals to harness the full potential of artificial intelligence without worrying about underlying infrastructure. Featherless.ai offers scalable, secure, and easy-to-use tools that empowers businesses and individuals alike to accelerate their AI initiatives. For more information, visit www.featherless.ai

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Qwerky-72B Real-time Voice Private Cloud Featherless.ai
相关文章