Seroter's Daily Reading — #816 (July 1, 2026) — Seroter's Daily Reading

Listen: https://blossom.buildtall.systems/30a9e9c2b964d08940a3a000a0bb0dd4fa419d33e958138d9d8a2f0464f27408.mpga

Seroter's Daily Reading, episode 816, July 1, 2026.

We've somehow got both tomorrow and Friday off for our Independence Day holiday. I'll still probably do a reading list both days. I'm sure the anticipation will keep you up tonight.

Starting off today with "Code should be regenerated, not maintained": Codeplain makes the case for spec-driven development that sits at the heart of a big question right now: what do you actually maintain when AI is generating code faster than anyone can review it? Codeplain, a startup out of Ljubljana, is making a bet on spec-driven development as the answer. Their core idea is that specs, not code, should be the source of truth. You edit the spec, a coding agent regenerates the implementation, and the code itself becomes disposable. Their CEO puts it bluntly: code should not be maintained, code should be regenerated. The spec is what you preserve. This connects directly to a broader philosophical framework that Chad Fowler has been developing around what he calls the Phoenix Architecture, the idea that software should be designed to burn itself down and be reborn cyclically. The provocative part is the cultural shift this requires. Code has always been the artifact worth preserving. If the spec becomes the thing worth keeping and code is just generated output that gets thrown away and regenerated, that's a pretty fundamental inversion of how most developers think about their craft. Codeplain also released an open-source agentic skills framework called plain-forge that lets coding agents draft and maintain specs iteratively rather than generating one massive specification upfront. That incremental approach matters because they've found that while developers resist writing specs, they are generally happy to read them. A well-structured spec turns out to be easier to review and reason about than the code it produces.

Next up, Five tools to bolster your AI coding stack from InfoWorld. The piece leans heavily into security concerns. When tools like Cursor are installing dependencies and running actions on a developer's behalf, they can pull in malicious or unvetted packages unintentionally. Techniques like intercepting tool calls, validating inputs and outputs, enforcing least-privilege access, and isolating credentials are becoming foundational to safe AI-driven development. One stat from Qodo's report stands out: eighty-nine percent of enterprise engineering teams have experienced an AI-generated code incident and had a production outage caused by AI-generated code. That's not a small number. The piece also flags the amnesia problem with current AI coding assistants. Each session starts without memory of an organization's unique context, standards, and business logic, which means you need stateful systems with persistent organizational memory to safely scale AI across a team.

From the Google Cloud blog, they announced Get started with the Claude apps gateway for Google Cloud that closes the gap for enterprise deployments of Claude Code. An individual developer could already point Claude at a Google Cloud project and keep inference inside their perimeter, but enterprise rollouts introduce friction around per-developer credentials, managed settings, and usage attribution. The apps gateway is a self-hosted service that sits between your Claude Code clients and Google Cloud. It centralizes identity through your IdP, enforces RBAC rules server-side, adds telemetry that carries verified user info instead of spoofable client-set attributes, and lets you set spend caps per user or group. If you're already running Claude Code on Google Cloud for compliance reasons, this makes organization-wide deployment actually tractable.

Lenny's Newsletter has a guest post on How top PMs increase their leverage with AI from Colin Matthews, and this one has a framework worth stealing for other disciplines too, not just product management. It lays out three ladders of leverage. Personal leverage is about using AI to complete your own tasks, ranging from drafting text to connecting your LLM to external products so it can pull and push information end-to-end. Product leverage is about getting closer to shipping, from web-based prototypes all the way to coding agents shipping actual pull requests to production. And systems leverage is about building repeatable steps to consistently outsource work to AI and get high-quality results. The insight is that these ladders are not about replacing people, they're about shifting where human judgment matters most. As you climb each ladder, you spend less time doing the work and more time reviewing it and knowing when to stop.

Moving to the Google Developers blog, they released a quality flywheel skill that your coding agent can install and then drive on your behalf to improve itself. The skill runs a five-stage evaluation cycle: Prepare Data, Run Inference to produce traces, Grade those traces with Google's AutoRaters, Analyze Failures, and then Optimize & Iterate. The key architectural choice is that the optimizer never grades its own work. Whatever proposes a fix, whether it's your coding agent or an automated optimizer, the GenAI evaluation service scores it independently. This prevents the system from gaming its own metrics. There's a compelling example in the post where the skill identified a subtle failure mode in a trip-planning agent: the agent's internal state was correct after mid-conversation changes, but its final message to the user echoed the stale value anyway. Nothing crashed, the plan looked fine on a quick skim, but the user got the wrong answer. That's exactly the kind of failure that slips through vibe-checking and requires disciplined evaluation to catch.

Netflix's tech blog had a post on GenPage: Towards End-to-End Generative Homepage Construction at Netflix, their approach to end-to-end generative homepage construction, though I should note the article was behind a bot verification wall so I didn't get the full details. The headline idea is generative UIs with fine-tuned models underneath, which is an interesting frontier as teams try to move beyond static page assembly toward fully AI-generated interface construction.

From DevClass, a developer revived an old complaint about .NET's long-term support is not long-term enough, dev complains, being too short for enterprise upgrade cycles. The current LTS cycle is three years, but by the time the next LTS appears, two of those years have already elapsed, leaving just one year to actually upgrade. The developer notes that fifty percent of deployed versions of their software are running end-of-life versions. Microsoft chose the three-year window to balance stable deployment time with innovation velocity, and extended paid support has not been offered. To put it in context, Java LTS gets five years plus extended support, Python gets five years of security fixes for all releases, so three years does feel tight for large organizations with long deployment cycles.

From Miguel Carranza at RevenueCat, there's a thoughtful post on AI shouldn't shrink headcount. It should shrink teams, the argument is that with AI, the minimum viable size of a high-output team is smaller than it used to be. Writing code is cheaper, reviewing unfamiliar parts of the codebase is easier, so you don't need three engineers from every specialty weighing in on every meaningful piece of work. RevenueCat went from a few larger teams to more than double the number of smaller teams, each with one to three engineers and a tech lead who keeps the work moving without people management responsibilities. The post is honest about the downsides: redundancy loss when someone goes on leave, more demand on ICs to make calls without waiting for permission, and the risk of fragmentation if coordination breaks down. But the core point is that when you do hire, the new people add parallelism instead of coordination overhead. That framing, more teams, not fewer people, is worth sitting with.

Google announced Gemini Spark updates: macOS launch, connected apps and more, bringing their personal automation agent to the desktop. It can now move beyond the chat window and automate tasks across your desktop files and apps, like sorting PDFs or creating spreadsheets from local files. Coming soon, you can assign multi-step tasks remotely, like asking it to find a report on your Mac, pull a number, and email it to you while you're away. This is the personal agent category arriving for everyone, though it remains to be seen exactly how people will incorporate this into their daily workflows.

Hugging Face and Cerebras showed what becomes possible when you pair Gemma 4 with Cerebras inference for real-time voice AI in their post on Hugging Face and Cerebras bring Gemma 4 to real-time voice AI. The demo builds a speech-to-speech pipeline with Parakeet for speech recognition, Gemma 4 on Cerebras for inference, and Qwen3TTS for text-to-speech. The open, modular architecture means every layer can be inspected and replaced. This same pipeline already powers Reachy Mini robots, with more than nine thousand of them in the wild, which is a nice signal that this isn't just a research demo.

Then from Ollama, Gemma 4 is now nearly ninety percent faster on Apple Silicon in Faster Gemma 4 on MLX with multi-token prediction, using multi-token prediction, a technique where a small draft model proposes the next several tokens and the main model verifies them in a single pass. Code is especially well-suited for this because it's full of closing brackets, repeated identifiers, and boilerplate, so the draft model's proposals are accepted often. The speedup shows up most in coding agents, which call the model continuously as they read files, run tools, and work through tasks. Faster generation makes those agents noticeably more responsive, and Ollama tunes the draft length automatically as the model runs, so no configuration required.

That's episode 816. Several threads running through today's set: the infrastructure getting serious around AI coding in the enterprise, from secure toolchains to centralized gateways and evaluation frameworks, the ongoing re-examination of what software artifacts are actually worth preserving long-term, and the steady march of better, faster, more accessible models. Happy Fourth of July to those celebrating, and I'll see you tomorrow.