Seroter's Daily Reading — #768 (April 21, 2026) — Seroter's Daily Reading

Listen: https://blossom.nostr.xyz/89cb5e6ebbbca421e95e21838d489b91df7da6efdbb669364b83fe237b013e65.mpga

Seroter's Daily Reading, episode 768, April 21, 2026.

Google Cloud Next is nearly here, and Richard says he got to meet with Google Developer Experts, spend time with his team, and do some customer meetings in the pre-event activities. So let's get into today's links.

Starting with a piece on running Gemma 4 locally with Go, by Vladimir Vivien on Medium: Building Gemma 4 Local-Powered LLM Apps with Go and Yzma. Seroter calls it a solid example of building an LLM app with Go and running the open Gemma model on a local machine. This is a good one to keep in your back pocket if you're interested in running models locally and avoiding the API call treadmill. Gemma is Google's open-weight model family, and tying it to Go is a natural fit for the kind of tooling work that Go developers do every day.

Next up, a post from dev.ongoro.top: most devs ignore git worktree. here's why they're wrong. Seroter says he's sold on it. The argument here is pretty simple and practical: git worktree lets you attach multiple working directories to the same repository, each pointing at a different branch, and they all share the same .git folder underneath. No extra clones, no duplicated history. So if you've got a long-running build or test in one branch, you can switch over to work on a feature in another branch without ever stashing or losing your place. The author covers the basic commands, then walks through real scenarios where this actually saves time, like keeping a build environment live while you review a PR in another worktree. His closing point is the one that sticks: it's built into git, zero install, works everywhere, and most people ignore it because it sounds advanced. Once you force yourself to use it for one sprint, you never go back to single-checkout hell.

Then we have a long and genuinely excellent piece from Addy Osmani on his blog: Agent Harness Engineering. This is the one I want to spend some time on. The core idea, coined by Viv Trivedy, is that a coding agent is not just the model — it's the model plus everything you build around it. The prompts, the tools, the context policies, the hooks, the sandboxes, the subagents, the feedback loops. That's the harness. And Osmani's thesis, backed by some striking evidence, is that a decent model with a great harness beats a great model with a bad harness. He cites Terminal Bench 2.0 data where the same Claude Opus 4.6 model scored far lower inside Claude Code than it did in a custom harness — a team moved a coding agent from Top 30 to Top 5 by changing only the harness. That's a wild data point. He walks through what a harness actually includes: system prompts, AGENTS.md files, tools, MCP servers, the filesystem and git state management, orchestration logic for subagents, hooks that enforce behavior, and observability. He describes what he calls the ratchet principle — every mistake an agent makes should become a permanent rule in your harness, not a one-off story. Every line in a good AGENTS.md should be traceable back to a specific thing that went wrong. He also has a useful behavior-first framing: start from the behavior you want, then derive the harness piece that delivers it. If you can't name the behavior a component exists to deliver, it probably shouldn't be there. There's a lot more in here, including a breakdown of Claude Code's architecture as an annotated harness, and a closing observation that as models improve, the space of interesting harness combinations doesn't shrink — it moves. The ceiling shifts, the failure modes shift, and the scaffolding has to shift with them.

Moving on. A piece from the Google Cloud blog: Drop to Demo: Gemini CLI Subagents. Seroter says if you're going to get into agent-first engineering, you need to get familiar with coordinating a team of agents. This one was behind a bot verification wall, so we only have Seroter's note, but the title tells you the direction: subagents, plural. Not one agent doing one task, but a coordinated team. That's a pattern that's showing up more and more.

Then a piece from InfoWorld: GitHub pauses new Copilot sign-ups as agentic AI strains infrastructure. Seroter notes that AI services are suddenly dealing with capacity issues, and this piece makes the point that our interactions with AI have gotten longer and more sophisticated, which consumes a lot more compute. The analysis here is that GitHub had an unavoidable role in the developer world, but rationalizing Copilot limits was inevitable as resources got constrained. One analyst quoted says the next step is likely more differentiated plans with clearer monetization between individual users. For enterprise buyers, the advice is to evaluate AI coding tools as metered infrastructure with usage ceilings, not unlimited productivity layers. That's a framing shift worth sitting with.

Next, an educational post from the Daily Dose of Data Science blog: How to fine-tune LLMs in 2026. If you're using GPT or Claude, you're using the same model as everyone else, with the same capabilities and the same cost. But if you take a small open-source model and fine-tune it on your specific task, it can outperform a model a hundred times its size at a fraction of the cost and latency. This post walks through supervised fine-tuning versus reinforcement fine-tuning, the difference being that SFT teaches the model what to say, while RL fine-tuning teaches it to succeed through trial and error. The piece then covers GRPO, Group Relative Policy Optimization — the same algorithm that powered DeepSeek-R1's reasoning capabilities — which generates multiple completions and grades them relative to each other, only needing relative rankings rather than absolute scores. They also cover ART, the Agent Reinforcement Trainer, a fully open-source framework that brings GRPO to real Python agents with tool calls and multi-turn conversations. And there's a section on RULER, which uses an LLM as a judge to compare agent trajectories, ranking them with no labeled data required. Asking "which of these four attempts best achieved the goal?" turns out to be far more reliable than asking for a numerical score.

Then we have a piece from strategizeyourcareer.com: The engineer AI can't replace. Seroter describes it as exploring developer taste — the experience needed to know what good output looks like. The author's core argument is that AI slop is not an AI problem, it's a taste problem. The models are doing exactly what they were asked to do. The question is whether the person on the other end of the prompt knows what right looks like before they hit enter. He defines developer taste as the judgment to know what the right thing is, and the discipline to pursue it, before you write a single line of code. That's taste versus skill: taste is what you bring to the problem, skill is what you do with the problem once you understand it. He walks through five taste mistakes he keeps seeing: treating AI output as final without reviewing it, copying from secondary sources instead of primary ones, skipping problem decomposition, shipping only the happy path, and getting code to work without making it right. His prescription is to work outside-in, write the test that describes what the feature should do from the outside before writing any code, keep commits small and single-purpose, read your own code in the review UI before assigning it, go to primary sources like the docs and specs, and define the data structure yourself and let AI fill it in. The closing line is the one that lands: AI can approximate the surface patterns. It cannot approximate the ache. That ache is the thing that tells you how much a shortcut is going to cost you in a month. That's taste.

Finally, a piece from the Google Cloud blog: Building Event-Driven Data Agents with BigQuery, Pub/Sub, and ADK. Seroter says he really liked how this whole solution came together. The flow here is from real-time stream processing to agent-based investigation. An alert comes in over Pub/Sub, a pipeline processes it, and then an agent equipped with specific tools and instructions autonomously investigates by querying BigQuery, analyzing unstructured data, or grounding findings with web search, ultimately categorizing the transaction as a false positive or flagging it for escalation. The piece also covers the human-in-the-loop advantage — effectively filtering out the noise so investigators spend their time only on the most complex cases. And it wraps with agent analytics, getting observability into what the agent is doing, how long it takes, and how much it costs.

Wrapping up episode 768. A couple of threads weave through today's links. One is that the infrastructure behind the AI tooling boom is straining under the load, and that forces real choices about how we pay for it. The other is that building good agents is as much about the scaffolding — the harness, the tools, the feedback loops — as it is about picking the right model. And underneath both of those, there's a quieter thread: that human judgment, taste, the scar tissue from things that broke in production, is still doing work that AI can't do yet, and maybe can't do at all.

That's episode 768. We'll see you next time.