Braintrust Blog

npub12rqy8jsv0musnr2lyartgmwrddn7yyqw29f0zv2x5rd5l2gx4glsvc7zqz@drss.io

Latest insights, tutorials, and updates from the Braintrust team. Learn about AI evaluation, LLM observability, and best practices for building reliable AI products.

https://www.braintrust.dev/blog RSS source

Braintrust CLI and MCP

3 Apr 2026

Learn when to use the Braintrust CLI and MCP depending on where you are in the AI development workflow.

Evals are the new PRD

27 Mar 2026

Evals are the new PRD

27 Mar 2026

Why AI product managers should replace traditional PRDs with evals, and how the eval flywheel becomes the operating system for AI product development.

What is AI observability?

19 Mar 2026

AI observability is a new infrastructure category built on traces, evals, and feedback loops. Learn what it means, why it's technically hard, and how it changes AI product development.

What is AI observability?

19 Mar 2026

Evals for PMs: A practical guide to AI product quality

17 Mar 2026

Everything a product manager needs to know about evals, from building datasets and scoring criteria to running experiments and integrating evals into your product development process.

Evals for PMs: A practical guide to AI product quality

17 Mar 2026

Keep building with the Starter plan

16 Mar 2026

Keep building with the Starter plan

16 Mar 2026

Starter is a new Braintrust plan with no platform fee, designed to scale with your needs.

Supporting privacy and compliance for EU teams

12 Mar 2026

Supporting privacy and compliance for EU teams

12 Mar 2026

Braintrust's decoupled architecture gives EU teams control over where their AI data lives, simplifying GDPR compliance and data residency requirements.

How to build your first offline eval

10 Mar 2026

How to build your first offline eval

10 Mar 2026

A 10-step guide to going from a vibe to a working eval system, using a real Mermaid diagram generation project as an example.

Trace keynote recap: See it, improve it, optimize it

25 Feb 2026

Automatically discover what matters in your production traces with Topics

25 Feb 2026

Trace keynote recap: See it, improve it, optimize it

25 Feb 2026

Everything we announced at the Trace keynote, including Topics, the Braintrust CLI, and the Braintrust Gateway.

Automatically discover what matters in your production traces with Topics

25 Feb 2026

Topics uses AI-powered clustering to surface recurring patterns, from errors and user intents to sentiment, across thousands of traces.

Braintrust's series B: building the infrastructure for production AI

17 Feb 2026

Braintrust has raised $80M to become the observability layer for shipping quality AI.

Braintrust's series B: building the infrastructure for production AI

17 Feb 2026

The 5 pillars of AI model performance

12 Feb 2026

The 5 pillars of AI model performance

12 Feb 2026

A framework for evaluating AI models across five dimensions, with Claude Opus 4.6 and GPT-5.3 Codex as case studies.

Testing if "bash is all you need"

22 Jan 2026

Testing whether filesystems and bash provide the optimal abstraction for AI agents through rigorous evaluation.

Testing if "bash is all you need"

22 Jan 2026

Security is a choice: how Braintrust lets you decide where your AI data lives

21 Jan 2026

Braintrust offers flexible deployment options so you can keep sensitive AI data in your own infrastructure while still benefiting from a modern SaaS experience.

Security is a choice: how Braintrust lets you decide where your AI data lives

21 Jan 2026

Building observable AI agents with Temporal

20 Jan 2026

Bringing together durable execution and LLM observability to make AI agents easier to build, monitor, and operate in production.

Building observable AI agents with Temporal

20 Jan 2026

Debugging Ralph Wiggum with Braintrust Logs

13 Jan 2026

How observability makes autonomous AI development actually work.

Debugging Ralph Wiggum with Braintrust Logs

13 Jan 2026

Claude Code meets Braintrust

23 Dec 2025

A two-way integration that brings observability into your development loop.

Claude Code meets Braintrust

23 Dec 2025

AI observability beyond Python and TypeScript

22 Dec 2025

AI observability beyond Python and TypeScript

22 Dec 2025

Braintrust now supports Java, Go, Ruby, and C# with native SDKs.

Brainstore makes AI observability at scale possible

18 Dec 2025

Brainstore makes AI observability at scale possible

18 Dec 2025

Real-world benchmarks show Brainstore is up to 24x faster than competitors, making it possible to observe AI systems at production scale.

Evals are a team sport: How we built Loop

25 Nov 2025

How we debugged Loop's prompt optimization workflow by combining manual review, Loop analysis, and cross-functional collaboration.

Evals are a team sport: How we built Loop

25 Nov 2025

Turn production data into better AI with Loop

24 Nov 2025

Loop is the AI assistant that helps teams query, analyze, and improve AI applications faster.

How Retool uses Loop to turn logs into AI roadmap decisions

24 Nov 2025

Turn production data into better AI with Loop

24 Nov 2025

The three pillars of AI observability

18 Nov 2025

Why traces, evals, and annotation redefine observability for AI systems.

The three pillars of AI observability

18 Nov 2025

Braintrust Java SDK: AI observability and evals for the JVM

23 Oct 2025

How Portola empowers subject matter experts to improve AI quality

20 Oct 2025

Braintrust on the Vercel Marketplace

16 Oct 2025

How Dropbox automates evals for conversational AI

15 Oct 2025

Measuring what matters: An intro to AI evals

10 Oct 2025

Claude Sonnet 4.5 analysis

29 Sep 2025

AI that knows your data

13 Sep 2025

A/B testing can't keep up with AI

3 Sep 2025