select navigate esc close

Braintrust Blog

npub12rqy8jsv0musnr2lyartgmwrddn7yyqw29f0zv2x5rd5l2gx4glsvc7zqz@drss.io

Latest insights, tutorials, and updates from the Braintrust team. Learn about AI evaluation, LLM observability, and best practices for building reliable AI products.

How to improve your golden datasets with human review

21 May 2026

Turn production traces into golden datasets by adding human review to your eval workflow, then use that ground truth to improve scorers over time.

The six generations of AI agents and how to eval them

21 May 2026

Agent architectures have evolved through six generations, and each one demands a different eval strategy. Walk through every generation with a single SRE incident response example.

The six generations of AI agents and how to eval them

21 May 2026

How to evaluate multi-turn conversations

14 May 2026

How to evaluate multi-turn conversations

11 May 2026

Learn how to score multi-turn conversations by combining per-turn and per-conversation evals, then automating it all in production.

Why your traces and evals belong in the same place

11 May 2026

What you can catch, fix, and automate when traces and evals live in the same platform.

Why your traces and evals belong in the same place

11 May 2026

How to earn stakeholder trust with evals and observability

28 Apr 2026

How to earn stakeholder trust with evals and observability

28 Apr 2026

How PMs can use Braintrust dashboards, custom trace views, and Loop to turn AI evals and production behavior into something stakeholders can read.

How to prepare for AI compliance and governance

13 Apr 2026

The EU AI Act and ISO/IEC 42001 are raising the bar for AI governance. AI observability helps teams meet these requirements with production-level evidence.

Agentic eval development with the Braintrust CLI

8 Apr 2026

Use coding agents and the Braintrust CLI to debug failing evals, iterate on prompts, and close the loop between observability and code.

How Brainstore works: architecture for AI observability at scale

6 Apr 2026

A deep dive into the architecture of Brainstore, Braintrust's custom database built for AI observability workloads.

Braintrust CLI and MCP

3 Apr 2026

Learn when to use the Braintrust CLI and MCP depending on where you are in the AI development workflow.

Evals are the new PRD

27 Mar 2026

Evals are the new PRD

27 Mar 2026

Why AI product managers should replace traditional PRDs with evals, and how the eval flywheel becomes the operating system for AI product development.

What is AI observability?

19 Mar 2026

What is AI observability?

19 Mar 2026

AI observability is a new infrastructure category built on traces, evals, and feedback loops. Learn what it means, why it's technically hard, and how it changes AI product development.

Evals for PMs: A practical guide to AI product quality

17 Mar 2026

Everything a product manager needs to know about evals, from building datasets and scoring criteria to running experiments and integrating evals into your product development process.

Evals for PMs: A practical guide to AI product quality

17 Mar 2026

Keep building with the Starter plan

16 Mar 2026

Keep building with the Starter plan

16 Mar 2026

Starter is a new Braintrust plan with no platform fee, designed to scale with your needs.

Supporting privacy and compliance for EU teams

12 Mar 2026

Supporting privacy and compliance for EU teams

12 Mar 2026

Braintrust's decoupled architecture gives EU teams control over where their AI data lives, simplifying GDPR compliance and data residency requirements.

How to build your first offline eval

10 Mar 2026

A 10-step guide to going from a vibe to a working eval system, using a real Mermaid diagram generation project as an example.

How to build your first offline eval

10 Mar 2026

Automatically discover what matters in your production traces with Topics

25 Feb 2026

Topics uses AI-powered clustering to surface recurring patterns, from errors and user intents to sentiment, across thousands of traces.

Trace keynote recap: See it, improve it, optimize it

25 Feb 2026

Everything we announced at the Trace keynote, including Topics, the Braintrust CLI, and the Braintrust Gateway.

Trace keynote recap: See it, improve it, optimize it

25 Feb 2026

Automatically discover what matters in your production traces with Topics

25 Feb 2026

Braintrust's series B: building the infrastructure for production AI

17 Feb 2026

Braintrust's series B: building the infrastructure for production AI

17 Feb 2026

Braintrust has raised $80M to become the observability layer for shipping quality AI.

The 5 pillars of AI model performance

12 Feb 2026

The 5 pillars of AI model performance

12 Feb 2026

A framework for evaluating AI models across five dimensions, with Claude Opus 4.6 and GPT-5.3 Codex as case studies.

Testing if "bash is all you need"

22 Jan 2026

Testing whether filesystems and bash provide the optimal abstraction for AI agents through rigorous evaluation.

Testing if "bash is all you need"

22 Jan 2026

Security is a choice: how Braintrust lets you decide where your AI data lives

21 Jan 2026

Braintrust offers flexible deployment options so you can keep sensitive AI data in your own infrastructure while still benefiting from a modern SaaS experience.

Security is a choice: how Braintrust lets you decide where your AI data lives

21 Jan 2026

Building observable AI agents with Temporal

20 Jan 2026

Building observable AI agents with Temporal

20 Jan 2026

Bringing together durable execution and LLM observability to make AI agents easier to build, monitor, and operate in production.

Debugging Ralph Wiggum with Braintrust Logs

13 Jan 2026

Debugging Ralph Wiggum with Braintrust Logs

13 Jan 2026

How observability makes autonomous AI development actually work.

Claude Code meets Braintrust

23 Dec 2025

A two-way integration that brings observability into your development loop.

Claude Code meets Braintrust

23 Dec 2025

AI observability beyond Python and TypeScript

22 Dec 2025

Braintrust now supports Java, Go, Ruby, and C# with native SDKs.

AI observability beyond Python and TypeScript

22 Dec 2025

Brainstore makes AI observability at scale possible

18 Dec 2025

Real-world benchmarks show Brainstore is up to 24x faster than competitors, making it possible to observe AI systems at production scale.

Brainstore makes AI observability at scale possible

18 Dec 2025

Evals are a team sport: How we built Loop

25 Nov 2025

How we debugged Loop's prompt optimization workflow by combining manual review, Loop analysis, and cross-functional collaboration.

Turn production data into better AI with Loop

24 Nov 2025

Loop is the AI assistant that helps teams query, analyze, and improve AI applications faster.

The three pillars of AI observability

18 Nov 2025

Why traces, evals, and annotation redefine observability for AI systems.