Seroter's Daily Reading — #762 (April 13, 2026) — Seroter's Daily Reading

Listen: https://blossom.nostr.xyz/d1a6115e6485a4be138a4090470fc653f902ed3bb7c3744a996515a9006e5d6a.mpga

Seroter's Daily Reading, Episode 762. April 13, 2026.

I've been building this weekend. First, I added an ADK agent to a web solution I like to demonstrate to customers, which gave me a chance to deploy to Agent Engine again. Then I built up a solution with Pub/Sub and blogged about it today. Check it out if you're curious about calling LLMs from your messaging engine.

Let's dig into this week's reads.

First up: Where do all the tokens go in agentic software engineering? This is from Research-Driven Engineering Leadership, and the answer might surprise you. Researchers at Concordia University analyzed token consumption across 30 software development tasks using ChatDev with GPT-5 Reasoning. They mapped each phase to standard development stages and tracked input, output, and reasoning tokens. The big finding: Code Review dominates at 59.4% of all token consumption. Not coding, not design, not testing. Review. The iterative back-and-forth between programmer and reviewer agents eats up the vast majority of the budget. Design was just 2.4% and coding was 8.6%. The research also found that input tokens make up nearly 54% of total usage, meaning agents spend more time re-consuming context than generating new output. That's a significant communication tax. The application here for engineering leaders: budget for review, not generation. The real cost isn't writing code, it's the refinement loop. Look for architectures that reduce context passing, and consider human checkpoints before expensive review phases to catch obvious problems early.

Next: What are AI gateways in 2026, and do you actually need one now? This is from ngrok's blog, and their advice has shifted significantly. Six months ago, an AI gateway was useful but optional. Now they're calling it essential for anyone running agentic AI. The reasoning is solid. A single user request can trigger dozens of LLM calls, tool invocations, and multi-step reasoning chains. Without a gateway, you have no visibility into what your agents are doing, what they're costing, or whether they're behaving correctly. The piece draws an analogy: if ngrok is the gateway to your web traffic, an AI gateway is the gateway to your agent traffic. They break down the modern AI infrastructure stack with four layers: foundation models at the bottom, infrastructure and gateway in layer two, model ops and observability in layer three, and application and orchestration on top. The threshold for needing a gateway has dropped. If you're running any agentic AI in production, you need visibility and control. The only teams that can skip it are those making straightforward single-model API calls with no agent behavior.

Speaking of gateways: here's an example of applying protections to an open model on Kubernetes with Guardrails at the gateway: Securing AI inference on GKE with Model Armor. This is from Google Cloud, and it covers securing AI inference on GKE with Model Armor. The challenge they identify is the black box safety problem. Most LLMs have internal safety training, but relying solely on it presents three major risks: opacity because refusal logic is baked into model weights, inflexibility because you can't tailor criteria to your needs, and monitoring difficulty because a model's refusal returns a 200 OK with text saying it can't help, which looks like a successful transaction to security systems. Model Armor addresses this by acting as an intelligent gatekeeper that inspects traffic before it reaches the model and after it responds. It blocks prompt injection and jailbreak attempts before they waste TPU and GPU cycles, filters responses for hate speech and dangerous content, and integrates with DLP to scan outputs for PII before leakage.

Here's a practical piece: 8 Tips for Writing Agent Skills. This is from Philipp Schmid, and it's solid. A skill is a folder with a SKILL.md file and optional helper files. Skills fall into two categories: capability skills that help an agent do something the base model can't do consistently, and preference skills that encode your specific workflow. The most important tip is nailing the description. If it's vague, the agent won't know when to activate it. If it's too broad, it fires on every request. The body of the skill only loads after it triggers, so you need to be specific about both the what and the when. Another key insight: write instructions, not essays. Research shows longer, more comprehensive context actually hurts performance. Use directives like always use this approach rather than this approach is recommended. Lead with examples. And if exact steps matter, write a script instead of trying to encode them in a skill. Don't turn skills into step-by-step workflows that take away the agent's ability to adapt, recover from errors, or find better approaches.

Google's Scion gives developers a smarter way to run AI agents in parallel with Google's Scion Gives Developers a Smarter Way to Run AI Agents in Parallel. This is from DevOps.com, and it covers a system solving unique problems for agent orchestration. The piece explains how Scion handles parallel execution of agents, though the full technical details were somewhat limited in the research. The concept is about giving developers better control over how agents run concurrently.

Here's a big number: AI infrastructure budgets set to triple as demand soars: Deloitte. This is from CIO Dive, citing a Deloitte report. They surveyed 515 U.S. enterprise companies in December. Almost half of respondents said they have more than 30 AI pilots in the works. By 2028, Deloitte projects nearly 70% of companies will be running that many AI proofs of concept. The investment into these projects is significant, with some enterprises projecting they'll spend almost four times more on AI infrastructure by 2028. It's a shift from traditional IT spending, which was planned for one-off modernization efforts. IT departments are shifting to more sustained year-over-year spending. One executive from Deloitte noted that token volumes are doubling and tripling, and companies are creating workloads that public cloud alone can't serve cost effectively at this point. The line is blurring between business and tech, and given the size of these budgets, technology decisions have to be made together with the business.

Here's something interesting: Cursor, Claude Code, and Codex are merging into one AI coding stack nobody planned. This is from The New Stack, and it's a great analysis. In the first week of April 2026, Cursor shipped a rebuilt interface for orchestrating parallel agents, OpenAI published an official plugin that runs inside Claude Code, and early adopters started running all three together. Not as competitors. As layers in a stack. The pattern mirrors infrastructure tools. Nobody runs a single observability tool. You run Prometheus for metrics, Grafana for dashboards, and PagerDuty for alerts. Each does one thing well, and value comes from composition. The article identifies three layers forming: orchestration with Cursor 3 as the control plane for managing fleets of coding agents, execution with Claude Code and Codex as the agents that actually write and debug code, and review as the newest layer where cross-provider review addresses what single-model workflows cannot. The key insight is that when you ask the same model that wrote your code to review it, you're asking someone to grade their own homework. A second model from a different provider applies genuinely independent scrutiny. OpenAI building a plugin for Anthropic's product is the most revealing signal. Rather than waiting for developers to switch, they embedded Codex where they already work. Anthropic gets a richer plugin ecosystem. OpenAI gets distribution inside a competitor's installed base.

Are AI certifications worth the investment with Are AI certifications worth the investment?? This is from InfoWorld, and it covers three main options. IBM's AI Developer Professional Certificate on Coursera is about $49 per month and covers machine learning, prompt engineering, neural networks, and deploying LLMs. They document cases of professionals moving from $52,000 salaries to $78,000 AI engineering positions after completion. The PMI AI+ Certification launched in 2025 and targets project managers leading AI deployments. A distinctive feature is that preparation earns 21 PDUs toward PMP renewal requirements. And NVIDIA's Deep Learning Institute certifications address advanced technical roles with costs ranging from $2,500 to $4,700 per course.

Go for AI agents: a field report with Go for AI agents: a field report. This is a great piece from someone who's been building agent infrastructure in Go. They point out that the agentic AI ecosystem runs on Python. LangChain, CrewAI, AutoGen, Semantic Kernel, LlamaIndex, all Python. But they went the other way and built five agent-related projects in Go: a governance proxy, a memory server, an MCP bridge for Ollama, an autonomous research agent, and a management dashboard backend. Their pattern shows that agent-mesh has just one external dependency for 11,000 lines of code covering MCP client and server, policy engine, rate limiter, approval workflow, trace store, and more. They make a compelling case: agents have layers, and the reasoning layer and orchestration layer are Python's domain. But the infrastructure layer that moves data, enforces policy, and manages processes is systems programming, and Go is a systems language. They also note some real gaps in the Go ecosystem: no official MCP SDK, no structured output libraries like Instructor or Outlines, and no agent framework. But they argue the first solid Go MCP SDK will get massive adoption. They recommend a simple split: if it touches model weights or needs rapid ML prototyping, use Python. If it's infrastructure, Go. Worth noting: the author suggests checking out Genkit and ADK for Go devs building AI apps and agents.

Finally: as an engineering manager, I couldn't ignore AI if my teams are to survive with As an Engineering Manager, I couldn't ignore AI if my teams are to survive. This is from Shift Mag, written by someone working through the familiar phases of change with AI adoption. They describe the five stages: denial, anger, bargaining, depression, and acceptance. In denial, they thought they were a top performer who didn't need to perform better. In anger, every topic became an AI topic and it was irritating. The message from this piece is that it's okay to feel negative, but staying in that state too long can undermine results. The author's key realization was that even if you keep delivering at your current pace, without adopting AI tools, you won't be able to keep up in a few months. Their conclusion: as a leader, you need to push yourself through the change and proactively lead your team's transformation. Stay transparent, share the doubts you've faced, and show the human side.

That's episode 762. We've covered token consumption patterns in multi-agent systems, the evolution of AI gateways, security guardrails on Kubernetes, practical advice for writing agent skills, parallel agent orchestration, surging infrastructure budgets, the emerging AI coding stack, AI certifications, building agents in Go, and leading teams through AI adoption. The themes that connect across these reads: the cost of agentic AI isn't where you think it is, infrastructure and control planes are becoming critical as agents proliferate, and the tools are composable rather than consolidated. That's what's moving in the space right now.

Sources