原创 呆呆 少晖 2025-07-30 19:21 上海
得物是广受年轻人喜爱的品质生活购物社区。在广泛场景下使用GenAI技术。向量数据库作为GenAI的基础设施之一,通过量化的高维空间数据结构,实现对嵌入向量的高效存储、索引和最近邻搜索。支撑包括多模态数据表征在内的复杂智能应用。

💡 **向量数据库是GenAI的核心基础设施**:文章指出,向量数据库通过量化高维空间数据结构,实现对嵌入向量的高效存储、索引和最近邻搜索,是支撑多模态数据表征等复杂智能应用的基石,对于大模型和GenAI技术驱动的产业变革至关重要。
🚀 **得物技术选型Milvus,并探索Zilliz**:面对海量数据和高性能搜索需求,得物技术团队在对比分析后选择了Milvus,并根据业务发展和成本效益考量,在核心场景引入了Zilliz。文章详细阐述了选择Milvus的原因,包括其良好的技术社区支持、契合的开发栈以及K8s支持下的弹性扩缩容能力。
🔧 **Milvus实践与运维挑战**:文章分享了Milvus在得物部署架构的演进,从独立资源池到共享资源池的迁移,以及索引类型(HNSW与DiskANN)的选择。同时,也深入探讨了向量数据库的运维难点,如QueryNode数量并非越多越快、标量索引的误用、批量写入的重要性以及常见的错误处理方法,为实际应用提供了宝贵的参考。
📈 **面向未来的展望与高可用架构**:展望未来,得物技术致力于构建数据迁移闭环,实现业务数据的自动化量化与写入,并加强数据准确性校验。此外,文章还详细介绍了Milvus在高可用架构下的多种部署方案,包括同城多机房混部、同城多zone多副本以及同城多zone单独部署,为保障系统稳定运行提供了详尽指导。
原创 呆呆 少晖 2025-07-30 19:21 上海
得物是广受年轻人喜爱的品质生活购物社区。在广泛场景下使用GenAI技术。向量数据库作为GenAI的基础设施之一,通过量化的高维空间数据结构,实现对嵌入向量的高效存储、索引和最近邻搜索。支撑包括多模态数据表征在内的复杂智能应用。
并且通过得到的最新节点获取到新的相邻节点。
第四步(磁盘中搜索):反复先进二、三步骤操作,直到找到足够数量的邻居。解决:将集群升级到2.2.16,并且让业务 批量删除和写入数据。※ find no available rootcoord, check rootcoord state报错:pymilvus.exceptions.MilvusException: <MilvusException: (code=1, message=syncTimestamp Failed:err: find no available rootcoord, check rootcoord state, /go/src/github.com/milvus-io/milvus/internal/util/trace/stack_trace.go:51 github.com/milvus-io/milvus/internal/util/trace.StackTrace/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:329 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/client/client.go:421 github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).AllocTimestamp/go/src/github.com/milvus-io/milvus/internal/proxy/timestamp.go:61 github.com/milvus-io/milvus/internal/proxy.(*timestampAllocator).alloc/go/src/github.com/milvus-io/milvus/internal/proxy/timestamp.go:83 github.com/milvus-io/milvus/internal/proxy.(*timestampAllocator).AllocOne/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:172 github.com/milvus-io/milvus/internal/proxy.(*baseTaskQueue).Enqueue/go/src/github.com/milvus-io/milvus/internal/proxy/impl.go:2818 github.com/milvus-io/milvus/internal/proxy.(*Proxy).Search/go/src/github.com/milvus-io/milvus/internal/distributed/proxy/service.go:680 github.com/milvus-io/milvus/internal/distributed/proxy.(*Server).Search/go/pkg/mod/github.com/milvus-io/milvus-proto/go-api@v0.0.0-20230324025554-5bbe6698c2b0/milvuspb/milvus.pb.go:10560 github.com/milvus-io/milvus-proto/go-api/milvuspb._MilvusService_Search_Handler.func1/go/src/github.com/milvus-io/milvus/internal/proxy/rate_limit_interceptor.go:47 github.com/milvus-io/milvus/internal/proxy.RateLimitInterceptor.func1)>
问题:rootcoord和其他pod通信出现了问题。解决:先重建rootcoord,再依次重建相关的querynode、indexnode、queryrecord、indexrecord。※ 页面查询报错 (Search 372 failed, reason Timestamp lag too large lag)[2024/09/26 08:19:14.956 +00:00] [ERROR] [grpcclient/client.go:158] ["failed to get client address"] [error="find no available rootcoord, check rootcoord state"] [stack="github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).connect/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:158github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).GetGrpcClient/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:131github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).callOnce/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:256github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall/go/src/github.com/milvus-io/milvus/internal/util/grpcclient/client.go:312github.com/milvus-io/milvus/internal/distributed/rootcoord/client.(*Client).GetComponentStates/go/src/github.com/milvus-io/milvus/internal/distributed/rootcoord/client/client.go:129github.com/milvus-io/milvus/internal/util/funcutil.WaitForComponentStates.func1/go/src/github.com/milvus-io/milvus/internal/util/funcutil/func.go:65github.com/milvus-io/milvus/internal/util/retry.Do/go/src/github.com/milvus-io/milvus/internal/util/retry/retry.go:42github.com/milvus-io/milvus/internal/util/funcutil.WaitForComponentStates/go/src/github.com/milvus-io/milvus/internal/util/funcutil/func.go:89github.com/milvus-io/milvus/internal/util/funcutil.WaitForComponentHealthy/go/src/github.com/milvus-io/milvus/internal/util/funcutil/func.go:104github.com/milvus-io/milvus/internal/distributed/datanode.(*Server).init/go/src/github.com/milvus-io/milvus/internal/distributed/datanode/service.go:275github.com/milvus-io/milvus/internal/distributed/datanode.(*Server).Run/go/src/github.com/milvus-io/milvus/internal/distributed/datanode/service.go:172github.com/milvus-io/milvus/cmd/components.(*DataNode).Run/go/src/github.com/milvus-io/milvus/cmd/components/data_node.go:51github.com/milvus-io/milvus/cmd/roles.runComponent[...].func1/go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:102"]
问题:pulsar 组件对应相关pod问题导致不进行消费。解决:将pulsar 组件相关pod进行重建,查看日志,并且等待消费pulsar完成。※ Query Node 限制内存不足 (memory quota exhausted)报错:[2024/09/26 09:14:13.063 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_search.go:529] ["QueryNode search result error"] [traceID=62505beaa974c903] [msgID=452812354979102723] [nodeID=372] [reason="Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_policies.go:132] ["failed to do query with node"] [traceID=62505beaa974c903] [nodeID=372] [dmlChannels="[by-dev-rootcoord-dml_6_442659379752037218v0,by-dev-rootcoord-dml_7_442659379752037218v1]"] [error="code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_policies.go:159] ["retry another query node with round robin"] [traceID=62505beaa974c903] [Nexts="{\"by-dev-rootcoord-dml_6_442659379752037218v0\":-1,\"by-dev-rootcoord-dml_7_442659379752037218v1\":-1}"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_policies.go:60] ["no shard leaders were available"] [traceID=62505beaa974c903] [channel=by-dev-rootcoord-dml_6_442659379752037218v0] [leaders="[<NodeID: 372>]"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_policies.go:119] ["failed to search/query with round-robin policy"] [traceID=62505beaa974c903] [error="Channel: by-dev-rootcoord-dml_7_442659379752037218v1 returns err: code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)Channel: by-dev-rootcoord-dml_6_442659379752037218v0 returns err: code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_search.go:412] ["failed to do search"] [traceID=62505beaa974c903] [Shards="map[by-dev-rootcoord-dml_6_442659379752037218v0:[<NodeID: 372>] by-dev-rootcoord-dml_7_442659379752037218v1:[<NodeID: 372>]]"] [error="code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_search.go:425] ["first search failed, updating shardleader caches and retry search"] [traceID=62505beaa974c903] [error="code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)"][2024/09/26 09:14:13.063 +00:00] [INFO] [proxy/meta_cache.go:767] ["clearing shard cache for collection"] [collectionName=xxx][2024/09/26 09:14:13.063 +00:00] [WARN] [retry/retry.go:44] ["retry func failed"] ["retry time"=0] [error="code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)"][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/task_scheduler.go:473] ["Failed to execute task: "] [error="fail to search on all shard leaders, err=All attempts results:\nattempt #1:code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)\nattempt #2:context canceled\n"] [traceID=62505beaa974c903][2024/09/26 09:14:13.063 +00:00] [WARN] [proxy/impl.go:2861] ["Search failed to WaitToFinish"] [traceID=62505beaa974c903] [error="fail to search on all shard leaders, err=All attempts results:\nattempt #1:code: UnexpectedError, error: fail to Search, QueryNode ID=372, reason=Search 372 failed, reason Timestamp lag too large lag(28h44m48.341s) max(24h0m0s) err %!w(<nil>)\nattempt #2:context canceled\n"] [role=proxy] [msgID=452812354979102723] [db=] [collection=xxx] [partitions="[]"] [dsl=] [len(PlaceholderGroup)=4108] [OutputFields="[id,text,extra]"] [search_params="[{\"key\":\"params\",\"value\":\"{\\\"ef\\\":250}\"},{\"key\":\"anns_field\",\"value\":\"vector\"},{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"round_decimal\",\"value\":\"-1\"}]"] [travel_timestamp=0] [guarantee_timestamp=0]
<MilvusException: (code=53, message=deny to write, reason: memory quota exhausted, please allocate more resources, req: /milvus.proto.milvus.MilvusService/Insert)>原因:配置中Query Node配置内存上线达到瓶颈。解决:增加Query Node配置或者增加QueryNode节点数。※ 底层磁盘瓶颈导致ETCD访问超时报错:解决:从架构方面上进行解决,在集群维度将磁盘进行隔离,每个集群使用独立磁盘。AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。
鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑