Braintrust Blog 4小时前
Braintrust发布Java SDK,赋能JVM LLM应用可观测性与评估
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

Braintrust推出了专为Java开发者设计的开源Java SDK,旨在解决JVM生态系统中LLM应用在可观测性与评估方面工具的缺失。该SDK基于OpenTelemetry构建,能够无缝集成现有基础设施,并提供AI应用特有的功能,如生产环境下的LLM调用追踪(输入、输出、延迟、token使用和成本),以及在CI/CD中运行的评估框架,支持自定义评分器。开发者可以通过包装OpenAI和Anthropic客户端,轻松实现LLM交互的自动化仪器化。此外,SDK还支持从Braintrust获取提示词,简化了迭代和A/B测试流程,无需重新部署代码。

🚀 **解决JVM LLM工具链空白**:Braintrust Java SDK填补了JVM生态系统中AI可观测性与评估工具的空白,尤其是在Python和TypeScript工具占主导地位的情况下,为Java开发者提供了急需的解决方案。该SDK支持Java 17+,并利用OpenTelemetry构建,确保了与现有监控基础设施的兼容性,无需从头开始重构。这使得Java开发者能够更轻松地在银行、医疗和企业软件等领域构建和维护LLM驱动的应用。

🔍 **全面的生产环境LLM追踪**:SDK提供了一套完整的工具来追踪LLM调用。在生产环境中,它能自动捕获每次调用的输入、输出、延迟、token使用量和成本。通过与OpenTelemetry集成,这些追踪信息可以导出到Braintrust、Datadog、Honeycomb等后端,方便开发者在调试生产问题时,按元数据过滤追踪,搜索提示词和响应,精确了解模型接收和返回的内容。

✅ **CI/CD集成与高效评估**:该SDK内置了一个强大的评估框架,能够直接在CI/CD流程中运行。开发者可以定义测试用例及其预期输出,并编写自定义的评分函数来衡量模型表现。当进行提示词更改或模型切换时,该框架能清晰展示哪些测试用例通过或失败,并提供聚合分数,有效防止回归。此外,通过从Braintrust动态获取提示词,开发者可以快速迭代和进行A/B测试,无需频繁部署代码。

23 October 2025Andrew Kent

Java developers are building LLM applications across banking, healthcare, and enterprise software, but most AI observability and evaluation tools only target Python or TypeScript. The few JVM options either lack AI-specific features or require rebuilding your monitoring stack from scratch.

We built the Braintrust Java SDK to fix this. It's an open-source SDK for AI observability and evaluation that runs on Java 17+, built on OpenTelemetry so it fits into existing infrastructure.

If you're building LLM features in Java, you've likely run into these issues:

    Tracking LLM calls in production requires custom instrumentation to capture inputs, outputs, latency, token usage, and costs per requestTesting prompt changes or model swaps means either manual QA or writing custom test harnesses that don't integrate with existing eval toolsA/B testing prompts requires building feature flags, routing logic, and result tracking from scratchMost AI observability tools target Python/TypeScript and don't provide Java clients or JVM-compatible instrumentation

The SDK provides AI observability and evaluation for Java 17+ applications. It requires Java 17 or higher and uses modern language features (records, pattern matching, and more) where appropriate.

What's included:

    OpenTelemetry-based tracing built on OpenTelemetry spans and traces, not proprietary instrumentation. Export traces to Braintrust, Datadog, Honeycomb, or any OTLP-compatible backend. Fit alongside existing OpenTelemetry setups without conflicts using standard OTLP conventions for semantic attributes.Wrappers for OpenAI and Anthropic clients that automatically instrument LLM calls. Instrumentation is opt-in per client, so your existing Java services don't change unless you explicitly wrap AI clients.An evaluation framework that runs in CI/CD with support for custom scorersSupport for fetching prompts from Braintrust, managing datasets, and viewing traces in the UI

Track LLM calls in production: Every instrumented call captures input/output, latency, token counts, and costs. When debugging production issues, you can filter traces by metadata, search through prompts and responses, and see exactly what the model received and returned.

Run evals in CI/CD: Write test cases with expected outputs, define custom scoring functions, and run them on every commit. When you change a prompt or switch models, the eval framework shows which test cases passed, which failed, and aggregate scores across your test suite.

Fetch prompts from Braintrust: Instead of hardcoding prompts in your application, store them in Braintrust and fetch them at runtime. This lets you iterate on prompts without redeploying code and makes A/B testing different versions straightforward.

Here's how to instrument an OpenAI client:

Braintrust braintrust = Braintrust.get();OpenTelemetry openTelemetry = braintrust.openTelemetryCreate();OpenAIClient oaiClient = BraintrustOpenAI.wrapOpenAI(openTelemetry, OpenAIOkHttpClient.fromEnv()); // Use the client as normalvar response = oaiClient.chat().completions().create(    ChatCompletionCreateParams.builder()        .model(ChatModel.GPT_4O_MINI)        .addUserMessage("Explain quantum computing")        .build());

Every OpenAI call now flows through OpenTelemetry instrumentation, capturing inputs, outputs, latency, token usage, and costs.

To run an evaluation, define your task, test cases, and scoring functions:

var braintrust = Braintrust.get();var openTelemetry = braintrust.openTelemetryCreate();var openAIClient = BraintrustOpenAI.wrapOpenAI(openTelemetry, OpenAIOkHttpClient.fromEnv()); // Define your taskFunction<String, String> getFoodType = (String food) -> {    var request = ChatCompletionCreateParams.builder()        .model(ChatModel.GPT_4O_MINI)        .addSystemMessage("Return a one word answer")        .addUserMessage("What kind of food is " + food + "?")        .maxTokens(50L)        .temperature(0.0)        .build();    var response = openAIClient.chat().completions().create(request);    return response.choices().get(0).message().content().orElse("").toLowerCase();}; // Define your evalvar eval = braintrust.<String, String>evalBuilder()    .name("food-classification-eval")    .cases(        EvalCase.of("asparagus", "vegetable"),        EvalCase.of("banana", "fruit"),        EvalCase.of("chicken", "protein"))    .task(getFoodType)    .scorers(        Scorer.of("fruit_scorer",            result -> "fruit".equals(result) ? 1.0 : 0.0),        Scorer.of("vegetable_scorer",            result -> "vegetable".equals(result) ? 1.0 : 0.0))    .build(); // Run itvar result = eval.run();System.out.println(result.createReportString());

This produces a detailed report showing per-case scores, aggregate metrics, and links to the Braintrust UI where you can drill into individual traces. Run this in CI/CD to catch regressions.

We're excited to support the AI developers building with Java.

For more examples, check out the README. The artifact is available on Maven Central. If you run into issues or have questions, please let us know on Discord.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Braintrust Java SDK LLM AI Observability AI Evaluation JVM OpenTelemetry Open Source
相关文章