Seroter's Daily Reading — #801 (June 9, 2026) — Seroter's Daily Reading

Listen: https://blossom.nostr.xyz/dabbdb7bdd5ca67834d1094d4f7354100fae6f62588dc4116cd90892e7c08db5.mpga

Seroter's Daily Reading, episode 801, June 9, 2026.

I'm really enjoying the quality of content lately around AI practices, risks, and strategies. It strikes a nice balance — not all sunshine and rainbows, not all doom and gloom. Just useful thinking. That tracks with today's episode, so let's get into it.

Starting with a piece called "In the Seams" from Devin Dickerson. This one hits hard for anyone who's been in a large organization or worked on a complex system. His core argument is simple but sharp: in software, the friction almost never lives in the components themselves. It lives in the seams between them. He draws on experience managing a massive distributed EMR system, where the data layer was always the seam — schemas that varied per component, transformations that broke interfaces, features that took nine months to a year to ship to deployment partners. The cook's plate is perfect, he writes, but the meal goes out cold because the handoff happened a beat too late.

His prescription is the platform approach: instead of treating integration as a team or a process, make it a property of the substrate — ideally the data layer. When data is a shared contract that components read and write through, the integration plane stops being a place where features stall. He applies the same frame to AI agents: every chat window with a coding agent is its own seam, renegotiating context and memory each session, unless platform capabilities like agentic memory and MCP absorb that friction. The question worth asking, he says, is not "where are the seams" but "what complexity can we shield our best people from." That's the question the org chart and the architecture diagram were always asking together.

From there, a very practical piece on building interrupt-resilient AI workloads on GKE. If you're running training jobs or batch inference on Spot VMs to save up to 90% on compute, you need to know this: when Google Cloud reclaims a Spot node, it sends an ACPI signal that Kubernetes translates into a SIGTERM. You have up to 15 seconds before SIGKILL. The piece walks through four concrete patterns. First, trap the SIGTERM and handle graceful shutdown — stop accepting new work, flush to disk, exit clean. Second, externalize your checkpoints to Cloud Storage, and on startup, check for a resume point before starting fresh. Third, design for idempotency — use upsert operations rather than blind inserts so that a retried job doesn't create duplicate data. Fourth, decouple your work queue using Pub/Sub rather than a static CSV that the script iterates through — if a node dies mid-job, Pub/Sub automatically re-queues the unacknowledged message and another worker picks it up seamlessly. None of this is exotic. It's just what you have to build if you're going to run on ephemeral compute responsibly.

Then a piece from The New Stack comparing the four major agentic coding tools that have defined the category this year: Claude Code vs. Cursor vs. Codex vs. Antigravity. Six months in, the convergence is striking. All four landed on the same blueprint: a terminal surface, explicit planning before execution, approval gates, MCP tool access, and some form of parallel or delegated agent work. Four very different organizations arrived at nearly the same design inside six months, which the author argues usually signals the design was less a choice than a discovery. The model has quietly become the commodity. SWE-bench scores for the leading models are in a narrow band, and Cursor will run any of them. When the engine stops separating products, the difference moves to everything around it — the harness, the workflow, the approval model, the distribution channel.

The pricing question is where they split. Codex rides on top of ChatGPT plans, which drove fast growth. Cursor Pro and Claude Code entry tier sit around twenty dollars monthly. Antigravity is still in preview with a new hundred-dollar Ultra tier. The author notes that the real comparison should be cost per accepted change, not the sticker price, because agents bill more like compute jobs than seats. One detail worth flagging: Grok Build from xAI has now entered the race, running up to eight subagents in parallel, each in its own Git worktree — the most aggressive architecture bet in the category so far.

From Eli Bendre, a thoughtful post on starting new projects with LLM agents. His key insight is keeping the human meaningfully in the loop, especially on projects you intend to maintain long-term. He draws a useful line between throwaway prototypes, where vibe-coding is fine, and projects you care about, where you should insist on reviewing every commit. His workflow: a CLI agent running locally, a VSCode window open alongside for diff review, and manual commits once you're satisfied. He keeps CLs small — not because agents can't produce thousands of lines, but because comprehension suffers the faster you go, and reviewing a small change thoroughly beats skimming a large one. On language choice, he makes a strong case for Go as the right language for agent-written projects: it changes infrequently, has few idiomatic variations, a rich standard library, and is optimized for readability rather than writability. Since reviewing agent code is 99% reading and 1% writing, those properties compound into a genuinely better experience. This is useful advice for anyone who's felt the pain of trying to understand what an agent just produced in a complex Python codebase.

Now, Paris Hilton as Android's first icon in residence. I genuinely love this. She writes about the gap between having ideas and being able to build them — the frustration of imagining almost anything but not being able to execute. Her time as an icon in residence, working with Gemini on Android, convinced her that technology doesn't have to be intimidating or limited to people with technical backgrounds. For the first time, she says, the distance between imagination and execution has become dramatically smaller. That's a quietly radical statement. You shouldn't have to be an engineer to go from creative idea to working implementation. This is the access argument, and it's one that matters more as AI lowers the floor for creation.

From the DX newsletter, a paper on 8 myths in software engineering and AI that Seroter flags as worth reading all together. The myths fall into three groups: how developers actually spend their time, how to measure AI's impact, and how adoption works in organizations. On time: developers spend only 14% of their day writing code. Accelerating that slice with AI is a smaller lever than the headlines suggest. On measurement: lines of code is a bad metric, and for one internal AI coding agent, only about half of generated PRs were ultimately accepted, with 15% abandoned and 15% stuck waiting on a reviewer. On adoption: 80% of developers use AI tools but only 29% trust their accuracy. There's also a documented competence penalty — women and older engineers receive harsher evaluations for AI-assisted work even when the output is identical. The through-line is that AI's impact is shaped more by the system around the developer than by the developer themselves.

Finally, Anthropic's launch of Claude Fable 5 and Claude Mythos 5 — the Mythos-class capability reaching general availability for the first time. Fable 5 is the broadly accessible model; Mythos 5 is restricted to approved users in security and biology research. The benchmark numbers are striking. On SWE-bench Pro, Fable 5 and Mythos 5 reach 80.3%, versus 58.6% for GPT-5.5. Stripe tested Fable 5 on a 50-million-line Ruby codebase and had it complete a codebase-wide migration in a single day that would have taken a team more than two months by hand. Cursor called it the state of the art on CursorBench and said it opened up long-horizon problems that were previously out of reach. The pricing is notable — 60 dollars per million tokens combined input and output, making it the most expensive of the major models. For CTOs, the key implication is less about raw generation and more about sustained execution: the model can understand an intent, plan steps, call tools, check its own work, and keep going without constant human steering. Anthropic is positioning this as the shift from autocomplete to actual teammate.

That is episode 801. The theme running through a lot of this: the real leverage in AI isn't the model, it's the system around it. The platform that absorbs seams, the workflow that keeps humans in the loop, the guardrails that let agents operate unattended, the intent that has to be written down because your agents can't reconstruct it for you. Same theme, different angle, all through the same set of questions about what actually moves the needle.

"In the Seams" — Devin Dickerson
"Surviving the eviction: How to build interrupt-resilient AI workloads on GKE" — Dev.to / Google Cloud
"Claude Code vs. Cursor vs. Codex vs. Antigravity — six months in" — The New Stack / Janakiram MSV
"Modern Engineering Values" — Eli Bendre
"Paris Hilton is Android's first icon in residence" — Google
"8 myths on software engineering and AI" — DX / Engineering Enablement
"Thoughts on starting new projects with LLM agents" — Eli Bendre
"Bringing the latest Gemini models to Apple developers" — Google
"Companies Are Using AI for Efficiency. They Should Use It to Grow" — Harvard Business Review
"Antigravity Managed Agents Tutorial: Ship Production AI Agents" — Google Cloud / Medium
"The Intent Debt" — Addy Osmani
"Anthropic brings Mythos to the masses with Claude Fable 5" — VentureBeat