Seroter's Daily Reading — #793 (May 28, 2026) — Seroter's Daily Reading

Listen: https://blossom.nostr.xyz/b0090f45f227f0cc3062dc9137af819d16017de4eaaa8229385faa5ef54912c6.mpga

Seroter's Daily Reading, episode 793, May 28, 2026.

Seroter suspects this week's theme is going to be how AI is reshaping the experience of tech managers. And looking at this week's reading list, he's probably right. We've got pieces on the cognitive load of running multiple agents, the new shape of code review, hiring in the age of AI coding tools, and a whole panel debate on where productivity actually lands when the robots start writing your code. Let's get into it.

We'll start with Addy Osmani and a piece called The Orchestration Tax is You. This one came out of a Google I/O panel he did with Seroter, Aja Hammerly, and Ciera Jaspan. The core idea is deceptively simple: running twenty AI agents doesn't mean there are twenty of you. Starting an agent is cheap, almost free, just a keystroke. But closing the loop on it is not cheap at all. Someone has to check whether what came back is correct, reconcile it with whatever the other agents touched, and that someone is you. And there is exactly one of you.

Osmani frames this with a systems thinking analogy. You are the GIL, the Global Interpreter Lock, of your AI agentic workflows. They can all run at once, but when any of their work needs genuine understanding or conflict resolution, it has to acquire the lock. Amdahl's Law makes this precise. The speedup from parallelizing is capped by the fraction of work that stays serial. In agent development, that serial fraction is judgment. Spawning eight agents doesn't speed up your judgment time. It just makes the queue feeding into it much deeper. And this is the part nobody prices in. The failure mode is invisible from the inside. Twenty running agents gives you the feeling of massive productivity. The dashboard is full, everything moves. But that feeling decouples from actually shipping good code to main. You can be maximally busy and barely produce anything, and from the inside it feels identical.

The fix, Osmani argues, is architectural, not disciplinary. You can't grind your way through a structural limit. Instead, scale your agent fleet to your review rate, not to the UI that happily lets you spawn twenty. Sort work into two piles. Isolated tasks that can run async and only need you at the final gate, and complex tasks where the judgment is the work itself, like a tricky bug or an architecture decision. The mistake is trying to parallelize the second pile. Batch your reviews so context switching doesn't bleed you dry. And only spend your lock on judgment. Make the agent write the passing test or generate the screenshot to prove the boring eighty percent itself. Let your scarce attention handle the twenty percent that actually needs a human.

Ciera Jaspan also flagged Margaret-Anne Storey's work on cognitive debt in that panel. The orchestration tax left unpaid is how you accumulate both technical debt and cognitive debt at once. You merge stuff you didn't read carefully, your mental model of the codebase goes stale, and none of that shows up on the dashboard today. It shows up when production breaks and you realize you have no idea how the system works anymore.

Next up, from HBR, Our Favorite Management Tips on Giving Feedback. Seroter calls this one all useful, still not his strongest skill. It's a curated selection of tips from their Management Tip of the Day newsletter, so it's a greatest hits collection on one of the trickiest parts of being a manager. Not much more to say here than to note that it landed this week as a companion to all the AI-heavy pieces. Because if Addy's piece is about what AI does to the developer's cognitive load, HBR's piece is about the manager's cognitive load in a world where giving good feedback is getting harder, not easier.

From InfoWorld, What Do Software Developers Do Now? This is a reflective piece from someone who still calls themselves a software developer but says they can no longer call themselves a programmer. Writing code and directing AI to write code are described as completely different animals. When you review code you wrote yourself, you're debugging typos and errors. When you review code an agent wrote for you, you're debugging the actual functionality, and then having the agent fix the problems you find. It's a totally different experience. You gain productivity, often by orders of magnitude, but you lose the coding experience. Some developers lament that loss. Some don't even realize what they're missing. The piece references others who have written about both sides of this trade. Either way, something is being left behind for sure.

Then we have a piece from Stephane Moreau on the blog for engineering managers, called Stop Interviewing Engineers Like It's 2022. The starting observation here is striking. In the second half of 2025, candidates using unauthorized AI in coding interviews jumped from fifteen percent to thirty-five percent. In technical roles specifically it hit forty-eight percent. And sixty-one percent of those people who cheated passed to the next stage. Six in ten people who cheated successfully progressed. The author frames this as the number that should make you stop scrolling. And then makes a counterintuitive argument: if your instinct is to demand better proctoring, that's the wrong instinct. You'd be racing against invisible screen overlays that a candidate's grandmother could install in five minutes. The companies that have actually fixed this didn't fix detection. They redesigned their interview completely.

The piece walks through where serious companies landed. Google is piloting an AI assistant inside the coding round itself, a three-panel layout with a file explorer, code editor, and AI chat. The AI can't edit files. The round focuses on reading, debugging, and optimizing existing code, not writing from scratch. The signal from Google is that the future is review and verify, not produce. Canva killed its CS fundamentals round and replaced it with an AI-assisted coding round where candidates are expected to use Copilot, Cursor, or Claude, with new questions that are complex and ambiguous, the opposite of one-shot prompts. Shopify lets candidates bring their own AI tools and the interviews require integrating AI-generated snippets into unfamiliar codebases. Reading and grafting code matters more than writing it. Meta runs an AI-enabled coding round with a rubric that includes problem solving, code quality, verification, and communication. Communication is back on the scorecard.

The keep, kill, and add framework is the practical heart of the piece. At the first technical round, kill the forty-five-minute algorithm puzzle and add a forty-five-minute read-and-fix round on a real, messy file from your codebase. The take-home stays gone, all of it. At the onsite, keep one pair-programming session and add a live AI-assisted round where you score candidates on direction, not output. At the system design round, keep the whiteboard but kill design Twitter from scratch and add a forty-five-minute review of an existing architecture with a prompt to identify what's wrong. The five signals that survive are problem decomposition, control over the AI, verification habits, architectural judgment, and communication. Three of those, verification, architectural judgment, and communication, are exactly the signals that AI tools dilute when you score on output instead.

On the junior pipeline question, the LeadDev AI Impact Report found that eighteen percent of engineering leaders expect to hire fewer juniors in the next twelve months and fifty-four percent think AI will reduce junior hiring long-term. The author argues that if you redesign your loop only around judgment and verification, you've designed juniors out of it, and suggests scoring learning velocity explicitly with a round where you teach the candidate something new on the call and watch how they use it. That is the round AI can't help with.

From Google Cloud, a piece on A Guide to AI Cold Starts on Cloud Run. The framing is that minimizing cold start latency isn't just about the model, it's about the infrastructure patterns and architectural decisions that keep inference fast, scalable, and secure. An AI cold start is described as a four-phase race. First, infrastructure provisioning, where Cloud Run allocates the physical GPU and injects pre-installed NVIDIA drivers. Second, block-level container image streaming, which lets a fifteen gigabyte CUDA image start as fast as a tiny Node app. Third, engine initialization, where your inference engine like vLLM or Ollama warms up. This is a massive CPU-heavy task and where most people get throttled without realizing it. And fourth, model loading and VRAM transfer, moving the model weights from storage into GPU memory. This is where GPU memory is the constraint, not CPU. If your model's weights don't fit entirely within GPU memory, performance degrades significantly as the system swaps to slower system RAM.

The piece walks through best practices for each phase. For Phase Four, concurrent download from Cloud Storage is the fastest approach for large model weights, while baking weights into the container image is efficient for smaller models under ten gigabytes. Model format and size matter directly here. Four-bit quantization is described as the ultimate cold start hack. Smaller weights mean fewer gigabytes to pull from storage. Picking a fast format like GGUF or Safetensors for zero-copy loading also shortens startup time. For Phases Three and Four, the startup CPU boost feature temporarily doubles your CPU power during initialization, which is essential for the CPU-heavy engine warmup. Direct VPC egress with Private Google Access keeps model weight traffic on Google's internal high-speed backbone during the transfer. Concurrency tuning is covered too, with the formula for calculating your ideal Cloud Run concurrent request setting based on the number of model instances, parallel queries per model, and ideal batch size. The goal is to keep the GPU fully saturated and avoid triggering a new scale-out event and the cold start that comes with it.

From DX, the AI Productivity Debate. This is a panel from DX Annual with senior engineering and research leaders including Rafe Colburn from Etsy, Jesse Adametz from Twilio, Eirini Kalliamvakou from GitHub, Collin Green from Google, and Brian Houck, the coauthor of the SPACE framework. The panel was presented with a series of statements and asked to react.

On whether AI means fewer engineers, the consensus was broadly no. The demand for software is going up, the price of building software is going down. Rafe put it simply, a job is a bundle of tasks and AI is a clean substitute for some of them, but people might not be doing those things anymore. Brian noted that what we define as a software engineer may change, the bundle of tasks is shifting.

On whether AI is accelerating technical debt faster than it helps refactor it, the panel disagreed. Jesse made the case that AI is an amplifier. High-performing engineers weren't putting garbage into the system before and they're still not, it's garbage in garbage out. Rafe said at Etsy, proportionately, they're not creating more tech debt than five or ten years ago. Brian pushed back, saying organizations are optimizing for PR velocity not cleanliness, and that cognitive debt, understanding systems less as AI writes more of the code, is a form of technical debt that is growing. Collin made the point that technical debt is ultimately a business decision, not strictly an engineering one.

On whether code review is now the bottleneck, Collin noted that developers only spend about fourteen percent of their time writing code, so code was never really the bottleneck. Jesse said AI is an amplifier, whatever your organization's bottleneck used to be, it still is. For Twilio, deep work has been a long-standing issue and still is. Brian noted that at Microsoft, some features take two or more years from planning to customer delivery. Going from two days to three days on review isn't the long pole, planning and prioritization are. Rafe pointed out something interesting. When writing a PR took two days, a review delay was annoying but tolerable. When writing takes ten minutes, the human process of finding a reviewer and waiting feels unbearable. The interruptions of human processes, unless they provide clear value, are grating.

On whether leaders need to mandate AI adoption, the panel largely said no. Jesse said Twilio did light enablement, install defaults and basic guidance, and adoption took off on its own. Rafe said mandates lead to shallow adoption and tokenmaxxing. AI adoption is inevitable, so why mandate something inevitable? Brian noted that the majority of engineering managers think AI usage is a reasonable individual performance metric, but engineers disagree. That's a myth to dispel. Activity metrics are useful for understanding patterns but should not be used as direct performance measures.

The throughline across these discussions, the DX piece notes, is that AI is reshaping the task mix of software work but not eliminating the need for engineers. The biggest risks and opportunities sit in how leaders design roles, measure impact, and consider the full product development lifecycle, not in whether AI can generate more code.

From Google's SRE team, a piece on AI in SRE: Where and how Google is deploying agentic AI to improve operations. The scope here goes well beyond root cause analysis. Google SRE is working on the entire SDLC. In reliability design, agents continuously monitor and improve playbooks and production documentation based on how they were used during incidents, and can generate new playbooks from incident data. In anomaly detection and alerting, Google is augmenting static threshold approaches with agents that detect anomalies in regular behavior rather than relying on predefined rules, using a model called TimesFM to predict customer-oriented SLOs. In incident management, there is an agentic orchestration layer that monitors communication surfaces during incidents, consolidates and summarizes data, supports handoffs between SREs with context documents, automatically drafts postmortems, and manages internal and external communications. In investigation and mitigation, agents use observability data, system topology, and dependency data to establish context and intent before forming hypotheses and proposing mitigations.

Google SRE also built something called AI Insights, a system that continuously reviews known incidents and extracts meaningful information from them to make that knowledge available to agents during future investigations. Gemini embedding models and vector databases power this. Each incident is also marked with risk categories that agents can consult before applying mitigations and that SREs can use to identify critical areas to address.

The design principles the team established are worth noting. Any process that can be automated with classic non-AI systems doesn't need to be replaced. AI agents must have strong identity with assigned roles and permissions. They must be able to explain and reason about why they performed an action and what options were considered and rejected. Transparency over black-box automation. And business continuity plans must include contingencies for potential AI failures.

From Stack Overflow's blog, Coding Agents Are Giving Everyone Decision Fatigue. The opening frames the shift well. In the past three years, code generators have gone from fancy autocomplete to tools that can whip up a whole application while you wait. What's in doubt is whether this change has been productive, cost efficient, or good for developers. Easy-to-create code has put greater strain on the later parts of the software development lifecycle, code review, DevOps, SRE, security, and infrastructure. It has also put greater strain on the developers themselves.

The piece cites research from Smartsheet showing that automation intensity for their enterprise users has grown fifty-five percent year-over-year, and overall activity has increased forty-six percent. The workday hasn't grown. It has just gotten denser with work as automations produce more without alleviating the need for humans to decide on what the definition of good is. That is the line that connects this piece back to almost everything else we covered this week. Agents can produce a lot. They cannot produce good. That judgment, that definition of good, is still entirely human, and it is now the load-bearing constraint in the system.

So, a week where the throughline is pretty clear. AI tools can generate, and they generate fast, but every piece we looked at circles back to the same bottleneck: the human. The human who reviews, judges, decides, steers, and decides what good even means. Addy Osmani called it the orchestration tax. The DX panel called it the long pole. The Stack Overflow piece called it decision fatigue. Whatever name you use, it's the same insight. The constraint has moved. It's no longer how fast you can write code. It's how fast you can make good decisions about code you didn't write. That is the skill now. That is the job.

That's it for episode 793. Check the show notes for links to all the articles. I'll see you next time.

Sources

The Orchestration Tax is You — Addy Osmani
Our Favorite Management Tips on Giving Feedback — Harvard Business Review
AI Won't Fix Your Broken Pipeline – It Will Break It Faster — Trisha Gee
Introducing Google AI Threat Defense — Google Cloud
What do software developers do now? — InfoWorld
Stop Interviewing Engineers Like It's 2022 — Stephane Moreau
A Guide to AI Cold Starts on Cloud Run — Google Cloud
AI Productivity Debate — DX Annual
AI in SRE: Where and how Google is deploying agentic AI to improve operations — Google Cloud
Coding Agents Are Giving Everyone Decision Fatigue — Stack Overflow