构建企业级AI助手：开源模型与检索增强的结合

MarkTechPost@AI 前天 10:26

本教程展示了如何利用开源模型构建一个可在Colab上运行的企业级AI助手。通过集成FAISS进行文档检索和FLAN-T5进行文本生成，并嵌入数据脱敏、访问控制和PII保护等企业策略，确保系统的智能与合规性。文章详细介绍了环境设置、模型加载、文档分块、索引构建、PII规则和策略检查，以及检索和生成答案的流程，并通过实际查询进行评估，证明了该系统的安全性和准确性。

💡 **环境与模型设置**: 文章首先介绍了构建企业AI助手所需的开源模型，包括用于文本生成的FLAN-T5和用于文本嵌入的Sentence Transformers（MiniLM），并配置为自动利用GPU以提高效率。同时，通过pip安装了必要的库，为后续开发奠定基础。

🗂️ **文档处理与索引构建**: 为了模拟企业内部知识库，创建了一个包含政策和SOP的文档集。这些长文本被分割成更小的、易于处理的块（chunks），以便进行嵌入和检索。随后，使用Sentence Transformers将这些文本块转换为向量，并存储在FAISS索引中，以实现快速高效的检索。

🛡️ **安全与合规性保障**: 引入了个人身份信息（PII）的识别和脱敏规则，以及针对数据泄露和加密篡改的策略检查。这些措施确保AI助手在处理敏感信息时遵守企业安全和合规要求，防止不当使用。

❓ **检索增强生成（RAG）流程**: 系统设计了一个检索函数，根据用户查询从FAISS索引中提取最相关的文档片段。然后，构建一个结构化的提示（prompt），将上下文信息和用户问题结合起来，传递给FLAN-T5模型，生成精准且符合策略的答案，并要求模型引用来源。

🚀 **评估与部署**: 通过一系列模拟的企业查询来评估系统的性能，包括数据加密、RFP响应和事件响应等场景。结果显示，该AI助手能够安全准确地执行检索增强推理，为构建可扩展、可审计的企业级AI解决方案提供了一个基础蓝图。

In this tutorial, we explore how we can build a compact yet powerful Enterprise AI assistant that runs effortlessly on Colab. We start by integrating retrieval-augmented generation (RAG) using FAISS for document retrieval and FLAN-T5 for text generation, both fully open-source and free. As we progress, we embed enterprise policies such as data redaction, access control, and PII protection directly into the workflow, ensuring our system is intelligent and compliant. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

!pip -q install faiss-cpu transformers==4.44.2 accelerate sentence-transformers==3.0.1from typing import List, Dict, Tupleimport re, textwrap, numpy as np, torchfrom sentence_transformers import SentenceTransformerimport faissfrom transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLMGEN_MODEL = "google/flan-t5-base"EMB_MODEL = "sentence-transformers/all-MiniLM-L6-v2"gen_tok = AutoTokenizer.from_pretrained(GEN_MODEL)gen_model = AutoModelForSeq2SeqLM.from_pretrained(GEN_MODEL, device_map="auto")generate = pipeline("text2text-generation", model=gen_model, tokenizer=gen_tok)emb_device = "cuda" if torch.cuda.is_available() else "cpu"emb_model = SentenceTransformer(EMB_MODEL, device=emb_device)

We begin by setting up our environment and loading the required models. We initialize FLAN-T5 for text generation and MiniLM for embedding representations. We ensure both models are configured to automatically use the GPU when available, so our pipeline runs efficiently. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

DOCS = [ {"id":"policy_sec_001","title":"Data Security Policy",  "text":"All customer data must be encrypted at rest (AES-256) and in transit (TLS 1.2+). Access is role-based (RBAC). Secrets are stored in a managed vault. Backups run nightly with 35-day retention. PII includes name, email, phone, address, PAN/Aadhaar."}, {"id":"policy_ai_002","title":"Responsible AI Guidelines",  "text":"Use internal models for confidential data. Retrieval sources must be logged. No customer decisioning without human-in-the-loop. Redact PII in prompts and outputs. All model prompts and outputs are stored for audit for 180 days."}, {"id":"runbook_inc_003","title":"Incident Response Runbook",  "text":"If a suspected breach occurs, page on-call SecOps. Rotate keys, isolate affected services, perform forensic capture, notify DPO within regulatory SLA. Communicate via the incident room only."}, {"id":"sop_sales_004","title":"Sales SOP - Enterprise Deals",  "text":"For RFPs, use the approved security questionnaire responses. Claims must match policy_sec_001. Custom clauses need Legal sign-off. Keep records in CRM with deal room links."}]def chunk(text:str, chunk_size=600, overlap=80):   w = text.split()   if len(w) <= chunk_size: return [text]   out=[]; i=0   while i < len(w):       j=min(i+chunk_size, len(w)); out.append(" ".join(w[i:j]))       if j==len(w): break       i = j - overlap   return outCORPUS=[]for d in DOCS:   for i,c in enumerate(chunk(d["text"])):       CORPUS.append({"doc_id":d["id"],"title":d["title"],"chunk_id":i,"text":c})

We create a small enterprise-style document set to simulate internal policies and procedures. We then break these long texts into manageable chunks so they can be embedded and retrieved effectively. This chunking helps our AI assistant handle contextual information with better precision. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

def build_index(chunks:List[Dict]) -> Tuple[faiss.IndexFlatIP, np.ndarray]:   vecs = emb_model.encode([c["text"] for c in chunks], normalize_embeddings=True, convert_to_numpy=True)   index = faiss.IndexFlatIP(vecs.shape[1]); index.add(vecs); return index, vecsINDEX, VECS = build_index(CORPUS)PII_PATTERNS = [   (re.compile(r"\b\d{10}\b"), "<REDACTED_PHONE>"),   (re.compile(r"\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b", re.I), "<REDACTED_EMAIL>"),   (re.compile(r"\b\d{12}\b"), "<REDACTED_ID12>"),   (re.compile(r"\b[A-Z]{5}\d{4}[A-Z]\b"), "<REDACTED_PAN>")]def redact(t:str)->str:   for p,r in PII_PATTERNS: t = p.sub(r, t)   return tPOLICY_DISALLOWED = [   re.compile(r"\b(share|exfiltrate)\b.*\b(raw|all)\b.*\bdata\b", re.I),   re.compile(r"\bdisable\b.*\bencryption\b", re.I),]def policy_check(q:str):   for r in POLICY_DISALLOWED:       if r.search(q): return False, "Request violates security policy (data exfiltration/encryption tampering)."   return True, ""

We embed all chunks using Sentence Transformers and store them in a FAISS index for fast retrieval. We introduce PII redaction rules and policy checks to prevent misuse of data. By doing this, we ensure our assistant adheres to enterprise security and compliance guidelines. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

def retrieve(query:str, k=4)->List[Dict]:   qv = emb_model.encode([query], normalize_embeddings=True, convert_to_numpy=True)   scores, idxs = INDEX.search(qv, k)   return [{**CORPUS[i], "score": float(s)} for s,i in zip(scores[0], idxs[0])]SYSTEM = ("You are an enterprise AI assistant.\n"         "- Answer strictly from the provided CONTEXT.\n"         "- If missing info, say what is unknown and suggest the correct policy/runbook.\n"         "- Keep it concise and cite titles + doc_ids inline like [Title (doc_id:chunk)].")def build_prompt(user_q:str, ctx_blocks:List[Dict])->str:   ctx = "\n\n".join(f"[{i+1}] {b['title']} (doc:{b['doc_id']}:{b['chunk_id']})\n{b['text']}" for i,b in enumerate(ctx_blocks))   uq = redact(user_q)   return f"SYSTEM:\n{SYSTEM}\n\nCONTEXT:\n{ctx}\n\nUSER QUESTION:\n{uq}\n\nINSTRUCTIONS:\n- Cite sources inline.\n- Keep to 5-8 sentences.\n- Preserve redactions."def answer(user_q:str, k=4, max_new_tokens=220)->Dict:   ok,msg = policy_check(user_q)   if not ok: return {"answer": f" {msg}", "ctx":[]}   ctx = retrieve(user_q, k=k); prompt = build_prompt(user_q, ctx)   out = generate(prompt, max_new_tokens=max_new_tokens, do_sample=False)[0]["generated_text"].strip()   return {"answer": out, "ctx": ctx}

We design the retrieval function to fetch relevant document sections for each user query. We then construct a structured prompt combining context and questions for FLAN-T5 to generate precise answers. This step ensures that our assistant produces grounded, policy-compliant responses. Check out the FULL CODES here.

Copy CodeCopiedUse a different Browser

def eval_query(user_q:str, ctx:List[Dict])->Dict:   terms = [w.lower() for w in re.findall(r"[a-zA-Z]{4,}", user_q)]   ctx_text = " ".join(c["text"].lower() for c in ctx)   hits = sum(t in ctx_text for t in terms)   return {"terms": len(terms), "hits": hits, "hit_rate": round(hits/max(1,len(terms)), 2)}QUERIES = [   "What encryption and backup rules do we follow for customer data?",   "Can we auto-answer RFP security questionnaires? What should we cite?",   "If there is a suspected breach, what are the first three steps?",   "Is it allowed to share all raw customer data externally for testing?"]for q in QUERIES:   res = answer(q, k=3)   print("\n" + "="*100); print("Q:", q); print("\nA:", res["answer"])   if res["ctx"]:       ev = eval_query(q, res["ctx"]); print("\nRetrieved Context (top 3):")       for r in res["ctx"]: print(f"- {r['title']} [{r['doc_id']}:{r['chunk_id']}] score={r['score']:.3f}")       print("Eval:", ev)

We evaluate our system using sample enterprise queries that test encryption, RFPs, and incident procedures. We display retrieved documents, answers, and simple hit-rate scores to check relevance. Through this demo, we observe our Enterprise AI assistant performing retrieval-augmented reasoning securely and accurately.

In conclusion, we successfully created a self-contained enterprise AI system that retrieves, analyzes, and responds to business queries while maintaining strong guardrails. We appreciate how seamlessly we can combine FAISS for retrieval, Sentence Transformers for embeddings, and FLAN-T5 for generation to simulate an internal enterprise knowledge engine. As we finish, we realize that this simple Colab-based implementation can serve as a blueprint for scalable, auditable, and compliant enterprise deployments.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

The post How to Design a Fully Functional Enterprise AI Assistant with Retrieval Augmentation and Policy Guardrails Using Open Source AI Models appeared first on MarkTechPost.

Fish AI Reader

AI辅助创作，多种专业模板，深度分析，高质量内容生成。从观点提取到深度思考，FishAI为您提供全方位的创作支持。新版本引入自定义参数，让您的创作更加个性化和精准。

FishAI

鱼阅，AI 时代的下一个智能信息助手，助你摆脱信息焦虑

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签