构建可靠的LLM应用：实用工程指南

Building with LLMs has taught me one clear lesson: the best AI feature is often invisible.

When it works, the user doesn’t stop to think “that was AI.” They just click a button, get an answer quickly, and move on with their task.

When it doesn’t work, you notice right away: the spinner takes too long, or the answer sounds confident but is not true. I’ve hit both of these walls many times. And each time, the fix was less about “smarter AI” and more about careful engineering choices. Use only the context you need. Ask for structured output. Keep randomness low when accuracy is important. Allow the system to say “I do not know.”

This guide is not about big research ideas. It’s about practical steps any engineer can follow to bring open-source LLMs inside real products. Think of it as a field guide. Think of it as a field guide with simple patterns, copy-ready code, and habits that make AI features feel reliable, calm, and fast.

How It Works — The Four‑Step Loop

Every reliable AI feature follows the same loop. Keep it consistent. Boring is good.

1) Read

What: Take the user input and only the smallest slice of app context you need. More context means higher cost, slower responses, and more room for the model to drift.

Examples

Support

Extraction

Search

2) Constrain

What: Set rules so the model stays within your desired constraints.

Do this

System prompt as a contract

valid JSON

Match temperature to the task

(there is no one setting that fits all)

Low (≈0.0–0.2):

Medium:

High:

Keep context tight in all cases. If your stack supports it, use a seed in tests for repeatability.

3) Act

What: Aim to produce LLM-generated outputs that can be used as inputs in the next part of your workflow without further processing required.

When to use what:

Structured Outputs

Why:

Example:

Code: Structured Outputs with Pydantic

from groq import Groqfrom pydantic import BaseModelfrom typing import Literalimport jsonclient = Groq()class ProductReview(BaseModel):product_name: strrating: floatsentiment: Literal["positive", "negative", "neutral"]key_features: list[str]response = client.chat.completions.create(model="moonshotai/kimi-k2-instruct",messages=[{"role": "system", "content": "Extract product review information from the text."},{"role": "user","content": "I bought the UltraSound Headphones last week and I'm really impressed! The noise cancellation is amazing and the battery lasts all day. Sound quality is crisp and clear. I'd give it 4.5 out of 5 stars.",},],response_format={"type": "json_schema","json_schema": {"name": "product_review","schema": ProductReview.model_json_schema()}})review = ProductReview.model_validate(json.loads(response.choices[0].message.content))print(json.dumps(review.model_dump(), indent=2))

Learn more: Structured outputs docs

Function calls (tools)

Why:

Example:

Plain text

Why:

Code: function calling (tools)

import jsonfrom groq import Groqimport os# Initialize Groq clientclient = Groq()model = "llama-3.3-70b-versatile"# Define weather toolsdef get_temperature(location: str):    # This is a mock tool/function. In a real scenario, you would call a weather API.    temperatures = {"New York": "22°C", "London": "18°C", "Tokyo": "26°C", "Sydney": "20°C"}    return temperatures.get(location, "Temperature data not available")def get_weather_condition(location: str):    # This is a mock tool/function. In a real scenario, you would call a weather API.    conditions = {"New York": "Sunny", "London": "Rainy", "Tokyo": "Cloudy", "Sydney": "Clear"}    return conditions.get(location, "Weather condition data not available")# Define system messages and toolsmessages = [    {"role": "system", "content": "You are a helpful weather assistant."},    {"role": "user", "content": "What's the weather and temperature like in New York and London? Respond with one sentence for each city. Use tools to get the information."},]tools = [    {        "type": "function",        "function": {            "name": "get_temperature",            "description": "Get the temperature for a given location",            "parameters": {                "type": "object",                "properties": {                    "location": {                        "type": "string",                        "description": "The name of the city",                    }                },                "required": ["location"],            },        },    },    {        "type": "function",        "function": {            "name": "get_weather_condition",            "description": "Get the weather condition for a given location",            "parameters": {                "type": "object",                "properties": {                    "location": {                        "type": "string",                        "description": "The name of the city",                    }                },                "required": ["location"],            },        },    }]# Make the initial requestresponse = client.chat.completions.create(    model=model, messages=messages, tools=tools, tool_choice="auto", max_completion_tokens=4096, temperature=0.5)response_message = response.choices[0].messagetool_calls = response_message.tool_calls# Process tool callsmessages.append(response_message)available_functions = {    "get_temperature": get_temperature,    "get_weather_condition": get_weather_condition,}for tool_call in tool_calls:    function_name = tool_call.function.name    function_to_call = available_functions[function_name]    function_args = json.loads(tool_call.function.arguments)    function_response = function_to_call(**function_args)    messages.append(        {            "role": "tool",            "content": str(function_response),            "tool_call_id": tool_call.id,        }    )# Make the final request with tool call resultsfinal_response = client.chat.completions.create(    model=model, messages=messages, tools=tools, tool_choice="auto", max_completion_tokens=4096)print(final_response.choices[0].message.content)

Learn more: Tool use docs

4) Explain

What: Show the user the steps, tools, and citations so they have more confidence in your app's AI-generated outputs.

Examples

here

In extraction, show a small preview of the matched text.In tool flows, show which tools ran and in what order, then keep the logs server side.

Core patterns you’ll reuse

Domain‑Specific Language (DSL): A small language designed for a specific domain. In apps, this often means search filters, a sandboxed SQL query, a chart spec, or an email template.

Pattern	What it Means	Example Request	Typical Output to Your App
Router	Classify and route to the right handler or model	“Is this billing or technical?”	{category: "billing"}
Extractor	Turn messy text into clean fields	“Grab names and dates from this email”	{names: [...], dates: [...]}
Translator	Convert intent to a safe DSL	“Show paid invoices this month per region”	Filters or SQL for a sandbox, or chart spec
Summarizer	Shorten or re-tone text	“Summarize the meeting for a new hire”	Short bullet list with optional citations
With Tools	Model proposes actions; app executes	“Search policy, then draft the reply”	Tool calls → tool results → short answer
Orchestrator	Chain steps while the app keeps control	“Verify doc, extract fields, request missing”	Plan → tool calls → JSON result + next steps

Shipping Safely: Tests, Monitoring, and Fallbacks

Before launch:

Write prompt unit tests that check the output format you expect. For JSON, assert required fields. For plain text, check for keywords, structure, style, or refusal phrases.Build a small eval set from real questions. Include expected outcomes and allowed refusals.Run in shadow mode or behind a feature flag and log everything.

What to track in production:

Latency p50 and p95Tokens in and outModel and prompt versionsTool call success and failureInvalid JSON rateRefusal rateUser edit rate (compare model output to final user text)Citation correctness (check answer against cited sources)

You can monitor these signals in the Groq Console dashboard, which gives you logs, metrics, usage, and batch insights to see how your AI features behave in real workloads.

Fallbacks that work

If the task is unanswerable, return “I do not know” with a next step.If results look long or slow, stream partial results and keep the UI responsive.Use small-then-big model routing where it matters, start with a smaller, faster, and cheaper model for most requests. If the output is incomplete, uncertain, or flagged as too complex, escalate the same request to a larger model. This way you save cost and latency on routine tasks, while still handling difficult edge cases with more power.

Common Pitfalls and Quick Fixes

Too much context

Letting the model touch prod data directly

Using chat for everything

Verbose answers driving cost

No versioning

A Short Checklist You Can Use Today

[ ] Write a clear system prompt and a strict JSON schema.[ ] Choose temperature for the task and keep context tight.[ ] Enforce JSON validation before UI or DB updates.[ ] Add one tool, log every call, and review failures weekly.[ ] Track latency, tokens, prompt and model versions, refusals, and invalid JSON.[ ] Launch with a feature flag and a simple fallback plan.

Don’t Forget

Boring AI features are reliable AI features that feel invisible to users - they just work. Read only what you need. Constrain with clear rules. Act with structured outputs and safe tools. Explain what happened. Start with the smallest useful feature. Use the patterns that fit your use case. Monitor everything. Improve based on real user behavior, not theoretical performance metrics. The goal isn’t to build impressive AI demos. It’s to ship features that users depend on every day.

How It Works — The Four‑Step Loop

1) Read

2) Constrain

3) Act

4) Explain

Core patterns you’ll reuse

Shipping Safely: Tests, Monitoring, and Fallbacks

Common Pitfalls and Quick Fixes

A Short Checklist You Can Use Today

Don’t Forget

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签