MarkTechPost@AI 08月13日
Meet LEANN: The Tiniest Vector Database that Democratizes Personal AI with Storage-Efficient Approximate Nearest Neighbor (ANN) Search Index
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Embedding搜索技术在理解语义相似性方面远超传统关键词搜索,但其存储开销巨大,限制了在个人设备上的应用。为解决此痛点,来自UC Berkeley等机构的研究者们开发了LEANN,一种专为资源受限设备设计的存储高效ANN搜索索引。LEANN通过集成紧凑的图结构和即时重计算策略,将存储需求降低至原始数据的5%以下,同时保持高检索精度和速度。它采用两级遍历算法和动态批处理技术,优化GPU利用率,并实现了比现有方法显著的存储和延迟降低,为AI在边缘设备的普及奠定基础。

💡 **LEANN大幅降低存储开销,实现高效ANN搜索**:LEANN是一种创新的存储高效ANN搜索索引,专为个人设备等资源受限环境设计。它通过集成紧凑的图结构和“即时重计算”策略,成功将索引存储量减少到原始数据量的5%以下,相比标准索引可实现高达50倍的存储压缩,解决了传统Embedding搜索的存储瓶颈问题。

🚀 **LEANN在保持高精度的同时优化了检索速度**:LEANN通过采用两级图遍历算法和动态批处理技术,有效降低了检索延迟。这种方法能够跨搜索跳聚合Embedding计算,提升GPU利用率,确保在实际问答基准测试中,能在2秒内实现90%的Top-3召回率,展现了其在速度和精度上的平衡。

📊 **LEANN的性能对比与局限性**:与EdgeRAG等方法相比,LEANN在存储和延迟方面表现出显著优势,延迟降低可达21.17至200.60倍。虽然在大多数数据集上LEANN的下游RAG任务表现更优,但在GPQA数据集上由于分布不匹配,其效果受限;在HotpotQA数据集上,单跳检索限制了准确性提升。此外,LEANN在索引构建过程中存在高峰值存储使用的问题,未来工作可进一步优化。

🛠️ **LEANN的技术架构与核心优化**:LEANN基于HNSW框架,其核心在于“按需计算”而非预先存储所有Embedding。它引入了两项关键技术:一是通过两级图遍历和动态批处理来降低重计算延迟;二是采用高保留率的图剪枝方法来减少元数据存储。这种设计使得LEANN能够高效处理大规模数据,并适应不同硬件平台。

Embedding-based search outperforms traditional keyword-based methods across various domains by capturing semantic similarity using dense vector representations and approximate nearest neighbor (ANN) search. However, the ANN data structure brings excessive storage overhead, often 1.5 to 7 times the size of the original raw data. This overhead is manageable in large-scale web applications but becomes impractical for personal devices or large datasets. Reducing storage to under 5% of the original data size is critical for edge deployment, but existing solutions fall short. Techniques like product quantization (PQ) can reduce storage, but either lead to a decrease in accuracy or need increased search latency.

Recommended Article: NVIDIA AI Released DiffusionRenderer: An AI Model for Editable, Photorealistic 3D Scenes from a Single Video

Vector search methods depend on IVF and proximity graphs. Graph-based approaches like HNSW, NSG, and Vamana are considered state-of-the-art due to their balance of accuracy and efficiency. Efforts to reduce graph size, such as learned neighbor selection, face limitations due to high training costs and dependency on labeled data. For resource-constrained environments, DiskANN and Starling store data on disk, while FusionANNS optimizes hardware usage. Methods like AiSAQ and EdgeRAG attempt to minimize memory usage but still suffer from high storage overhead or performance degradation at scale. Embedding compression techniques like PQ and RabitQ provides quantization with theoretical error bounds, but struggles to maintain accuracy under tight budgets.

Researchers from UC Berkeley, CUHK, Amazon Web Services, and UC Davis have developed LEANN, a storage-efficient ANN search index optimized for resource-limited personal devices. It integrates a compact graph-based structure with an on-the-fly recomputation strategy, enabling fast and accurate retrieval while minimizing storage overhead. LEANN achieves up to 50 times smaller storage than standard indexes by reducing the index size to under 5% of the original raw data. It maintains 90% top-3 recall in under 2 seconds on real-world question-answering benchmarks. To reduce latency, LEANN utilizes a two-level traversal algorithm and dynamic batching that combines embedding computations across search hops, enhancing GPU utilization.

LEANN’s architecture combines core methods such as graph-based recomputation, main techniques, and system workflow. Built on the HNSW framework, it observes that each query needs embeddings for only a limited subset of nodes, prompting on-demand computation instead of pre-storing all embeddings. To address earlier challenges, LEANN introduces two techniques: (a) a two-level graph traversal with dynamic batching to lower recomputation latency, and (b) a high degree of preserving graph pruning method to reduce metadata storage. In the system workflow, LEANN begins by computing embeddings for all dataset items and then constructs a vector index using an off-the-shelf graph-based indexing approach.

In terms of storage and latency, LEANN outperforms EdgeRAG, an IVF-based recomputation method, achieving latency reductions ranging from 21.17 to 200.60 times across various datasets and hardware platforms. This advantage is from LEANN’s polylogarithmic recomputation complexity, which scales more efficiently than EdgeRAG’s √𝑁 growth. In terms of accuracy for downstream RAG tasks, LEANN achieves higher performance across most datasets, except GPQA, where a distributional mismatch limits its effectiveness. Similarly, on HotpotQA, the single-hop retrieval setup limits accuracy gains, as the dataset demands multi-hop reasoning. Despite these limitations, LEANN shows strong performance across diverse benchmarks.

In this paper, researchers introduced LEANN, a storage-efficient neural retrieval system that combines graph-based recomputation with innovative optimizations. By integrating a two-level search algorithm and dynamic batching, it eliminates the need to store full embeddings, achieving significant reductions in storage overhead while maintaining high accuracy. Despite its strengths, LEANN faces limitations, such as high peak storage usage during index construction, which could be addressed through pre-clustering or other techniques. Future work may focus on reducing latency and enhancing responsiveness, opening the path for broader adoption in resource-constrained environments.


Check out the Paper and GitHub Page here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

The post Meet LEANN: The Tiniest Vector Database that Democratizes Personal AI with Storage-Efficient Approximate Nearest Neighbor (ANN) Search Index appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LEANN 向量数据库 ANN搜索 存储优化 个人AI
相关文章