Groq Blog 前天 00:39
构建可靠的LLM应用:实用工程指南
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提供了一份关于如何在实际产品中集成开源大型语言模型(LLM)的实用工程指南。核心理念是构建“看不见的”AI功能,即用户在使用时不会意识到AI的存在,而是无缝完成任务。文章强调了精心设计的工程选择而非单纯的AI“智能”是关键。指南详细介绍了“四步循环”工作流程:读取(仅获取必要上下文)、约束(通过系统提示和规则保持模型行为)、执行(生成结构化输出或调用工具)和解释(向用户展示过程以增加信任)。此外,还提供了关于选择输出方式(结构化输出、函数调用、纯文本)、安全 shipping(测试、监控、回退机制)以及避免常见陷阱的建议。

💡 **“看不见的”AI功能是最佳AI功能**:成功的AI功能应融入用户体验,让用户在不知不觉中完成任务,而非因AI而中断。当AI有效工作时,用户不会特别注意到它,而是专注于任务本身。

🔄 **构建可靠AI的“四步循环”**:1. **读取**:仅提取用户输入所需的最少应用上下文,避免因过多上下文导致成本增加、响应变慢和模型漂移。2. **约束**:通过系统提示、明确的规则(如要求JSON输出、处理缺失信息、遵守隐私规则)来控制模型行为,并根据任务调整温度参数。3. **执行**:生成可直接用于后续流程的输出,包括结构化输出(用于UI更新、存储、验证)、函数调用(使用实时数据或触发操作)或纯文本(仅用于叙述)。4. **解释**:向用户展示AI所使用的工具、步骤和引用,以建立信任和透明度。

🛠️ **选择合适的执行方式**:根据下游需求选择输出类型。**结构化输出**适用于需要程序化处理的数据(如更新UI、存储数据库字段),提供清晰、可解析的数据。**函数调用(工具)**允许模型访问实时数据或执行外部操作,如查询数据库或调用API。**纯文本**适用于仅需叙述性输出的场景,如摘要或简短回答。

🛡️ **安全 shipping 和监控**:在发布前进行充分的测试,包括编写单元测试、构建评估集,并在影子模式下运行。生产环境中,需持续监控延迟、Token使用量、模型版本、工具调用成功率、无效JSON率、拒绝率和用户编辑率,以便及时发现和解决问题。

⚠️ **规避常见陷阱**:避免使用过多上下文、让模型直接触碰生产数据、过度依赖聊天模式、产生冗长回答以及不进行版本控制。通过采用“小模型优先,大模型兜底”的策略,以及设计有效的回退机制,可以显著提升LLM应用的稳定性和效率。

Building with LLMs has taught me one clear lesson: the best AI feature is often invisible.

When it works, the user doesn’t stop to think “that was AI.” They just click a button, get an answer quickly, and move on with their task.

When it doesn’t work, you notice right away: the spinner takes too long, or the answer sounds confident but is not true. I’ve hit both of these walls many times. And each time, the fix was less about “smarter AI” and more about careful engineering choices. Use only the context you need. Ask for structured output. Keep randomness low when accuracy is important. Allow the system to say “I do not know.”

This guide is not about big research ideas. It’s about practical steps any engineer can follow to bring open-source LLMs inside real products. Think of it as a field guide. Think of it as a field guide with simple patterns, copy-ready code, and habits that make AI features feel reliable, calm, and fast.

How It Works — The Four‑Step Loop

Every reliable AI feature follows the same loop. Keep it consistent. Boring is good.

1) Read

What: Take the user input and only the smallest slice of app context you need. More context means higher cost, slower responses, and more room for the model to drift.

Examples

    Support — “Where is my order?” → pass the user ID and the last order summary, not the entire order history.Extraction — “Pull names and dates from this email thread” → pass the thread text only, not unrelated attachments.Search — “Find refund policy” → pass top three snippets from your docs, not the whole knowledge base.

2) Constrain

What: Set rules so the model stays within your desired constraints.

Do this

    System prompt as a contract
      State what the assistant is and is notRequire valid JSON that matches a schemaIf there is missing information, ask the user for a short follow‑up or answer “I don’t know.”Keep privacy rules explicit (do not log sensitive data)Version your prompts and test them
    Match temperature to the task (there is no one setting that fits all)
      Low (≈0.0–0.2): Extraction, classification, validation, RAG answers with citations, reliable tool choiceMedium: Templated drafts and light tone variationHigh: Brainstorming and creative copy where variety matters

Keep context tight in all cases. If your stack supports it, use a seed in tests for repeatability.

3) Act

What: Aim to produce LLM-generated outputs that can be used as inputs in the next part of your workflow without further processing required.

When to use what:

    Structured Outputs when the next step is programmatic, e.g., for updating UI, storing fields, and running validation.
      Why: Such outputs are ready to be used as inputs in the next step of a workflow or application as it's structured, parsable data that needs no further manual processing.Example: Extract {name, date, amount} from an invoice.

Code: Structured Outputs with Pydantic

from groq import Groqfrom pydantic import BaseModelfrom typing import Literalimport jsonclient = Groq()class ProductReview(BaseModel):product_name: strrating: floatsentiment: Literal["positive", "negative", "neutral"]key_features: list[str]response = client.chat.completions.create(model="moonshotai/kimi-k2-instruct",messages=[{"role": "system", "content": "Extract product review information from the text."},{"role": "user","content": "I bought the UltraSound Headphones last week and I'm really impressed! The noise cancellation is amazing and the battery lasts all day. Sound quality is crisp and clear. I'd give it 4.5 out of 5 stars.",},],response_format={"type": "json_schema","json_schema": {"name": "product_review","schema": ProductReview.model_json_schema()}})review = ProductReview.model_validate(json.loads(response.choices[0].message.content))print(json.dumps(review.model_dump(), indent=2))

Learn more: Structured outputs docs

    Function calls (tools) when the model needs live data or to trigger an action that your code controls such as search, fetch, compute, notify, or connect to external systems.
      Why: Tools let the model use fresh information instead of relying only on what it learned during training. That means it can query a database, call an API, or look up the latest records without inventing answers. The model proposes the action, your code decides whether to run it, and you keep a clear audit trail.Example: The model calls search_docs() to find relevant text, then render_chart() to create a visualization, and finally explains the result back to the user.
    Plain text when the result is narrative only, such as a summary or a short answer.
      Why: Simplest path when nothing else needs to consume the output.

Code: function calling (tools)

import jsonfrom groq import Groqimport os# Initialize Groq clientclient = Groq()model = "llama-3.3-70b-versatile"# Define weather toolsdef get_temperature(location: str):    # This is a mock tool/function. In a real scenario, you would call a weather API.    temperatures = {"New York": "22°C", "London": "18°C", "Tokyo": "26°C", "Sydney": "20°C"}    return temperatures.get(location, "Temperature data not available")def get_weather_condition(location: str):    # This is a mock tool/function. In a real scenario, you would call a weather API.    conditions = {"New York": "Sunny", "London": "Rainy", "Tokyo": "Cloudy", "Sydney": "Clear"}    return conditions.get(location, "Weather condition data not available")# Define system messages and toolsmessages = [    {"role": "system", "content": "You are a helpful weather assistant."},    {"role": "user", "content": "What's the weather and temperature like in New York and London? Respond with one sentence for each city. Use tools to get the information."},]tools = [    {        "type": "function",        "function": {            "name": "get_temperature",            "description": "Get the temperature for a given location",            "parameters": {                "type": "object",                "properties": {                    "location": {                        "type": "string",                        "description": "The name of the city",                    }                },                "required": ["location"],            },        },    },    {        "type": "function",        "function": {            "name": "get_weather_condition",            "description": "Get the weather condition for a given location",            "parameters": {                "type": "object",                "properties": {                    "location": {                        "type": "string",                        "description": "The name of the city",                    }                },                "required": ["location"],            },        },    }]# Make the initial requestresponse = client.chat.completions.create(    model=model, messages=messages, tools=tools, tool_choice="auto", max_completion_tokens=4096, temperature=0.5)response_message = response.choices[0].messagetool_calls = response_message.tool_calls# Process tool callsmessages.append(response_message)available_functions = {    "get_temperature": get_temperature,    "get_weather_condition": get_weather_condition,}for tool_call in tool_calls:    function_name = tool_call.function.name    function_to_call = available_functions[function_name]    function_args = json.loads(tool_call.function.arguments)    function_response = function_to_call(**function_args)    messages.append(        {            "role": "tool",            "content": str(function_response),            "tool_call_id": tool_call.id,        }    )# Make the final request with tool call resultsfinal_response = client.chat.completions.create(    model=model, messages=messages, tools=tools, tool_choice="auto", max_completion_tokens=4096)print(final_response.choices[0].message.content)

Learn more: Tool use docs

4) Explain

What: Show the user the steps, tools, and citations so they have more confidence in your app's AI-generated outputs.

Examples

    Append a short “What I used” note with source titles or IDs. The following Compound model is showing its answer with sources attached for clarity and trust. Try it here.
    In extraction, show a small preview of the matched text.In tool flows, show which tools ran and in what order, then keep the logs server side.

Core patterns you’ll reuse

Domain‑Specific Language (DSL): A small language designed for a specific domain. In apps, this often means search filters, a sandboxed SQL query, a chart spec, or an email template.

PatternWhat it MeansExample RequestTypical Output to Your App

Router

Classify and route to the right handler or model

“Is this billing or technical?”

{category: "billing"}

Extractor

Turn messy text into clean fields

“Grab names and dates from this email”

{names: [...], dates: [...]}

Translator

Convert intent to a safe DSL

“Show paid invoices this month per region”

Filters or SQL for a sandbox, or chart spec

Summarizer

Shorten or re-tone text

“Summarize the meeting for a new hire”

Short bullet list with optional citations

With Tools

Model proposes actions; app executes

“Search policy, then draft the reply”

Tool calls → tool results → short answer

Orchestrator

Chain steps while the app keeps control

“Verify doc, extract fields, request missing”

Plan → tool calls → JSON result + next steps

Shipping Safely: Tests, Monitoring, and Fallbacks

Before launch:

    Write prompt unit tests that check the output format you expect. For JSON, assert required fields. For plain text, check for keywords, structure, style, or refusal phrases.Build a small eval set from real questions. Include expected outcomes and allowed refusals.Run in shadow mode or behind a feature flag and log everything.

What to track in production:

    Latency p50 and p95Tokens in and outModel and prompt versionsTool call success and failureInvalid JSON rateRefusal rateUser edit rate (compare model output to final user text)Citation correctness (check answer against cited sources)

You can monitor these signals in the Groq Console dashboard, which gives you logs, metrics, usage, and batch insights to see how your AI features behave in real workloads.

Fallbacks that work

    If the task is unanswerable, return “I do not know” with a next step.If results look long or slow, stream partial results and keep the UI responsive.Use small-then-big model routing where it matters, start with a smaller, faster, and cheaper model for most requests. If the output is incomplete, uncertain, or flagged as too complex, escalate the same request to a larger model. This way you save cost and latency on routine tasks, while still handling difficult edge cases with more power.

Common Pitfalls and Quick Fixes

    Too much context → Fetch only what you need and re‑rank.Letting the model touch prod data directly → Always use tools and a safe layer.Using chat for everything → Many jobs are better as a simple extractor or router.Verbose answers driving cost → Prefer concise styles and structured fields.No versioning → Store prompt IDs and model versions in every log line.

A Short Checklist You Can Use Today

    [ ] Write a clear system prompt and a strict JSON schema.[ ] Choose temperature for the task and keep context tight.[ ] Enforce JSON validation before UI or DB updates.[ ] Add one tool, log every call, and review failures weekly.[ ] Track latency, tokens, prompt and model versions, refusals, and invalid JSON.[ ] Launch with a feature flag and a simple fallback plan.

Don’t Forget

Boring AI features are reliable AI features that feel invisible to users - they just work. Read only what you need. Constrain with clear rules. Act with structured outputs and safe tools. Explain what happened. Start with the smallest useful feature. Use the patterns that fit your use case. Monitor everything. Improve based on real user behavior, not theoretical performance metrics. The goal isn’t to build impressive AI demos. It’s to ship features that users depend on every day.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

LLM AI Engineering Large Language Models Product Development Engineering Guide AI Features Reliability Scalability Context Management Structured Outputs Tool Use Monitoring Fallbacks Prompt Engineering Open Source LLMs Practical AI DevOps LLM Application
相关文章