Seroter's Daily Reading — #798 (June 4, 2026) — Seroter's Daily Reading

Listen: https://blossom.nostr.xyz/d2c6471390fc147ddebdce5663a46e3ff3ccf13657bec9b19c76b04a21ceaca6.mpga

Seroter's Daily Reading, episode 798, June 4, 2026. Richard is in Paris wrapping up a cloud conference, demos worked, flying home tomorrow. Here's the list.

Starting with a piece from Lenny's Newsletter on Essential books for product builders. Lenny went through his reading history and pulled together the books that genuinely shaped how he thinks about building product. He's organized them by job-to-be-done, which makes it useful rather than just a generic list. Want to communicate better? He points to Nobody Wants to Read Your Shit by Steven Pressfield, On Writing Well, and Storyworthy. Want to execute better? The Great CEO Within, Scaling People, and The Goal. Want to be inspired? He recommends The Making of Prince of Persia, Build by Tony Fadell, and Shoe Dog. Want to become a better manager? High Output Management, The Making of a Manager, and Radical Candor. And for product folks specifically, he lands on The Mom Test, Escaping the Build Trap, and Continuous Discovery Habits. What strikes me is that Lenny mostly recommends books over ten years old because those have proven staying power. Great curation if you're looking to fill some gaps in your reading.

Second piece is from Blog for Engineering Managers on Budgeting your team's code review capacity. This is one of those articles that names something I've been sensing but haven't seen articulated clearly. AI generates code faster than your engineers can review it, and most teams haven't adjusted for the gap. One engineer told the author, "I spend most of my day trying to decide whether I trust things." That's the problem in a sentence. Human review speed hasn't scaled with AI code generation speed. The article walks through what happens when reviewer attention collapses. First, review quality drops because humans adapt to volume, they skim more and assume someone else checked things. Second, teams slowly understand their own systems less because code review is one of the main ways technical understanding spreads through a team. Third, your strongest engineers become invisible infrastructure, spending huge amounts of time reviewing AI-generated pull requests but looking less productive than the engineer who just shipped fast. The answer isn't to stop reviewing, it's to treat reviewer attention as a budget you actively manage. The article suggests making concentration visible, putting explicit limits on review load, and pushing more verification responsibility back toward whoever generated the code. Good piece for engineering leaders feeling this pressure.

Third piece is from Code Opinion on Modular Monolith Boundaries Done Wrong. The author sees a pattern where people build a modular monolith, have a clean structure with different projects and modules, but changes still ripple through the system. The issue is they defined physical boundaries but not logical ones. Logical boundaries are about ownership. Who owns the data? Who owns the business rules? Who owns the concept? The same word doesn't mean the same concept across different parts of your system. A vehicle in a compliance module and a vehicle in a dispatch module are not the same thing. Compliance cares about registration and insurance. Dispatch cares about availability and capacity. If you try to share one vehicle concept across both, you end up with coupling. The same applies to employee. HR owns employment records. IT owns user accounts. Those are different concepts even though they point to the same person. The key insight is that logical boundaries are not physical boundaries. You can have multiple physical deployments serving one logical boundary, or one physical process serving multiple logical boundaries. Start with ownership.

Fourth piece is CJ Roth on The Future of Agents. He lays out several predictions about where human-agent interaction is heading. The era of building AI chat into apps is ending. Power users are investing in their own personal agents with Hermes Agent, OpenClaw, Claude Code, and they're managing their tools through those agents rather than chatting inside each app. He predicts agents will be cloud-based for reliability and convenience. The big idea is Bring Your Own Agent, where apps expose functionality through MCP or CLI and bring good UI as a niche value proposition, while the user's personal agent holds memory, context, and execution environment. The user doesn't need to trust the app as much because their agent retains everything across every app. He also covers open standards like MCP, A2A, ACP, and AG-UI gaining traction, and the need for standards around memory and context which are currently unstandardized. Enterprise will demand open-source and open standards for data security reasons. Interesting framework for thinking about where this space is heading.

Fifth piece is from Google Cloud on Connecting AI agents with unstructured data using GCS MCP Servers. Google released an MCP server for Cloud Storage that lets agents read and write files to GCS. They've got two modes: a remote server for standard data access with built-in IAM security and audit logging, and a local server for custom tools where you can add business logic like PII redaction or enrichment when files are read. This is part of their broader MCP Toolbox for Databases that also covers BigQuery, AlloyDB, Spanner, and Cloud SQL. Think about what this enables. Screenshots, PDFs, documents, all the unstructured data living in cloud storage, now accessible to your agents through a standardized interface. Easy integration point for agentic workflows that need to work with varied file types.

Sixth piece is from Saturn CI on My Agent Skill for Test-Driven Development. The author observes that AI agents tend to be pretty bad at writing tests. The tests they write are often vague, hacky, and pointless. His solution is to give the agent a skill that says to follow Kent Beck's Canon TDD. He distills it down to specify-encode-fulfill: come up with specifications, encode them as automated tests, then write code to fulfill those specs. He's built this into a skill document he shares on GitHub. The key is prompts that describe user-facing behavior rather than implementation details, similar to how good TDD tests focus on behavior. He also runs his tests through a separate Test Design Review skill that spawns a different agent to catch violations of design principles. The observation that resonates is that the biggest AI productivity gains come from combining AI with timeless principles discovered decades ago. TDD isn't new, but it works well with AI because it structures how the agent approaches building something.

Seventh piece is from Google on Kaggle is making AI benchmark creation effortless. Kaggle launched local development for their benchmarks feature, so you can create, validate, push, run, and download evaluation tasks from your local environment using VS Code, Cursor, or coding agents. They've also released a write-kaggle-benchmarks skill that teaches an agent how to build evaluation tasks using the Kaggle CLI. You can install the skill and then describe an evaluation in plain language and get a working benchmark task on Kaggle. Good for teams building AI products who want to create rigorous evaluations without being tied to a web-based notebook editor.

Eighth piece is from Alex Ewerlof on Reliability Engineering for Air-Gapped Systems. He worked with defense sector teams on measuring the right things and setting SLOs, but these systems were air-gapped with no access to metrics, logs, or runtime. The challenge: how do you measure reliability when you can't observe the system in real time? His solution was to offload that responsibility to the on-site operators by giving them tools to act as SRE. Real-time status dashboards so operators can see CPU and memory usage. Proactive alerting. Pre-defined troubleshooting scripts. Auto-repair for common failures. Even anomaly detection using a small language model running on-site to analyze logs. The status page itself needs to be separate from the main application with its own deployment cadence and dependencies. If the status page goes down when the main service goes down, you've lost your most important tool for operators. He also mentions cryptic but specific error codes, short strings that pinpoint exactly what failed and where, so operators can look up fixes over the phone without having to share sensitive logs.

Ninth and final piece is from The Agomizer on How XP Made A Better AI Coder. The author spent her early career in the Ruby community where XP and Agile were core values. Pair programming, TDD, small commits, all normal practice. She finds that those same habits make her a more effective AI coding partner. Working with AI as a pair programmer rather than delegating tasks leads to better quality and ensures she actually understands the code. She breaks features into small chunks, each task a new conversation and a new branch, which works naturally with AI workflows. She uses a modified TDD approach where she starts each step with a prompt describing the end state rather than a unit test, but the principle is the same: specify behavior, then code to fulfill it. She also emphasizes self-documenting code, habits like naming collections with plural names and individuals with singular names. Readable code is easier for AI to understand and easier to review when you're looking at AI-generated diffs. Fun convergence: practices from twenty years ago that turn out to work well with AI.

That's episode 798.

Articles

Essential books for product builders
Budgeting your team's code review capacity
Modular Monolith Boundaries Done Wrong
The Future of Agents
Connecting AI agents with unstructured data using GCS MCP Servers
My Agent Skill for Test-Driven Development
Kaggle is making AI benchmark creation effortless
Reliability Engineering for Air-Gapped Systems
How XP Made A Better AI Coder Strong themes this time around: the verification problem with AI-generated code, both in code review and testing. The ownership question in modular systems. And the emerging agent stack from personal agents to standards and skills. Richard's flying home, might have wifi, might not. We'll see him soon.