MarkTechPost@AI 09月13日
IBM发布高性能开源嵌入模型,助力检索和RAG系统
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

IBM近期发布了两款名为granite-embedding-english-r2和granite-embedding-small-english-r2的开源嵌入模型,旨在为高吞吐量的检索和RAG(检索增强生成)系统提供支持。这些模型不仅体积小巧、效率高,而且采用Apache 2.0许可证,适合商业部署。它们基于ModernBERT架构,优化了注意力机制和位置编码,支持长达8192个token的上下文长度。在各项基准测试中,Granite R2模型均表现出色,尤其在长文档检索、表格检索和代码检索等领域。其高效的推理速度,即使在CPU环境下也具备实用性,为企业构建生产级检索解决方案提供了有力选择。

💡 **高性能开源嵌入模型**:IBM推出的granite-embedding-english-r2和granite-embedding-small-english-r2是专为高吞吐量检索和RAG系统设计的开源模型。它们采用Apache 2.0许可证,允许商业使用,体现了IBM在开源AI领域的投入,并为开发者提供了强大的工具。

🚀 **先进的ModernBERT架构与长上下文支持**:这两款模型均基于ModernBERT架构,通过交替的全局与局部注意力机制、优化的旋转位置嵌入(RoPE)以及FlashAttention 2技术,实现了效率与长距离依赖的平衡。关键在于它们支持高达8192个token的上下文长度,这对于处理长文档和复杂检索任务至关重要,显著优于第一代模型。

📊 **卓越的基准测试表现与领域适应性**:在MTEB-v2和BEIR等检索基准上,Granite R2模型展现出领先的准确性,甚至在特定任务如长文档检索(MLDR, LongEmbed)、表格检索(OTT-QA, FinQA, OpenWikiTables)和代码检索(CoIR)中表现尤为突出,显示出其在不同应用场景下的强大能力。

⚡ **高效推理与广泛部署潜力**:Granite R2模型在速度上表现出色,即使是较小的模型也能实现极高的文档编码速度。更重要的是,它们在CPU环境下也具有良好的性能,这意味着企业可以在不依赖大量GPU资源的情况下进行部署,大大降低了实际应用的门槛,使其成为一种高度灵活和经济高效的解决方案。

IBM has quietly built a strong presence in the open-source AI ecosystem, and its latest release shows why it shouldn’t be overlooked. The company has introduced two new embedding models—granite-embedding-english-r2 and granite-embedding-small-english-r2—designed specifically for high-performance retrieval and RAG (retrieval-augmented generation) systems. These models are not only compact and efficient but also licensed under Apache 2.0, making them ready for commercial deployment.

What Models Did IBM Release?

The two models target different compute budgets. The larger granite-embedding-english-r2 has 149 million parameters with an embedding size of 768, built on a 22-layer ModernBERT encoder. Its smaller counterpart, granite-embedding-small-english-r2, comes in at just 47 million parameters with an embedding size of 384, using a 12-layer ModernBERT encoder.

Despite their differences in size, both support a maximum context length of 8192 tokens, a major upgrade from the first-generation Granite embeddings. This long-context capability makes them highly suitable for enterprise workloads involving long documents and complex retrieval tasks.

https://arxiv.org/abs/2508.21085

What’s Inside the Architecture?

Both models are built on the ModernBERT backbone, which introduces several optimizations:

IBM also trained these models with a multi-stage pipeline. The process started with masked language pretraining on a two-trillion-token dataset sourced from web, Wikipedia, PubMed, BookCorpus, and internal IBM technical documents. This was followed by context extension from 1k to 8k tokens, contrastive learning with distillation from Mistral-7B, and domain-specific tuning for conversational, tabular, and code retrieval tasks.

How Do They Perform on Benchmarks?

The Granite R2 models deliver strong results across widely used retrieval benchmarks. On MTEB-v2 and BEIR, the larger granite-embedding-english-r2 outperforms similarly sized models like BGE Base, E5, and Arctic Embed. The smaller model, granite-embedding-small-english-r2, achieves accuracy close to models two to three times larger, making it particularly attractive for latency-sensitive workloads.

https://arxiv.org/abs/2508.21085

Both models also perform well in specialized domains:

Are They Fast Enough for Large-Scale Use?

Efficiency is one of the standout aspects of these models. On an Nvidia H100 GPU, the granite-embedding-small-english-r2 encodes nearly 200 documents per second, which is significantly faster than BGE Small and E5 Small. The larger granite-embedding-english-r2 also reaches 144 documents per second, outperforming many ModernBERT-based alternatives.

Crucially, these models remain practical even on CPUs, allowing enterprises to run them in less GPU-intensive environments. This balance of speed, compact size, and retrieval accuracy makes them highly adaptable for real-world deployment.

What Does This Mean for Retrieval in Practice?

IBM’s Granite Embedding R2 models demonstrate that embedding systems don’t need massive parameter counts to be effective. They combine long-context support, benchmark-leading accuracy, and high throughput in compact architectures. For companies building retrieval pipelines, knowledge management systems, or RAG workflows, Granite R2 provides a production-ready, commercially viable alternative to existing open-source options.

https://arxiv.org/abs/2508.21085

Summary

In short, IBM’s Granite Embedding R2 models strike an effective balance between compact design, long-context capability, and strong retrieval performance. With throughput optimized for both GPU and CPU environments, and an Apache 2.0 license that enables unrestricted commercial use, they present a practical alternative to bulkier open-source embeddings. For enterprises deploying RAG, search, or large-scale knowledge systems, Granite R2 stands out as an efficient and production-ready option.


Check out the Paper, granite-embedding-small-english-r2 and granite-embedding-english-r2. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post IBM AI Research Releases Two English Granite Embedding Models, Both Based on the ModernBERT Architecture appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

IBM AI 嵌入模型 开源 检索 RAG ModernBERT embedding models open-source retrieval AI Research
相关文章