Seroter's Daily Reading — #761 (April 10, 2026)
Seroter's Daily Reading· Listen: https://blossom.nostr.xyz/073902abeeb7f37853d3b46ea434c3e6bd9620bc42f3eafaa43c90467816e096.mpga
Source: Seroter's Original Post
Seroter's Daily Reading number 761, April 10th, 2026. Happy Friday. I hope you had a productive week and get the chance to enjoy yourself over the weekend.
Let's kick off with a piece that pushes back on a stat that has been making the rounds for the past year. You have probably heard the claim that ninety five percent of AI pilots fail to convert to production. That MIT study has been cited everywhere. AI Adoption by the Numbers took a hard look at the data and they find that number hard to believe. Based on their internal data and conversations with corporate executives, they conclude that twenty nine percent of the Fortune 500 and about nineteen percent of the Global 2000 are already live, paying customers of leading AI startups. That is not a pilot. That is production deployment. And it happened in just over three years since ChatGPT launched. They dug into what is actually working. Coding is the dominant use case by a huge margin, an order of magnitude ahead of everything else. Support and search round out the big three. On the industry side, tech leads, which is expected. But legal and healthcare are surprisingly far ahead of where they historically have been as enterprise software adopters. The legal sector in particular has moved fast. Harvey hit around two hundred million in annualized recurring revenue within three years of founding. The piece makes a key observation about why these particular use cases and industries are succeeding. Code, support tickets, and search queries share common traits. They are text based, they involve repetitive and rote work, there is a natural human in the loop for judgment, and critically, the outputs are verifiable. You can run code and know immediately if it works. A support ticket either resolves or it escalates. That verifiability is what makes ROI easy to measure and AI easy to trust. The model capabilities are improving fast, and a16z expects the adoption pattern to expand into other industries as the models get better at messier, less verifiable kinds of work.
Next up, the Model Context Protocol. The AAIF MCP Dev Summit 2026 took place in early April at the New York Marriott Marquis and drew about twelve hundred attendees. It has become the flagship event for the MCP ecosystem, organized under the Linux Foundation's Agentic AI Foundation. The big takeaway from the conference is that the gateway pattern has emerged as the dominant architectural consensus. Amazon, Uber, Docker, Kong, and Solo.io all converged on the same conclusion. If you are deploying MCP at scale, you need a centralized gateway paired with a registry as the control plane. Uber described their setup in detail. They built an MCP Gateway and Registry that automatically exposes thousands of internal Thrift, Protobuf, and HTTP endpoints to agents. All agentic traffic flows through their GenAI Gateway, which is a Go based proxy that scrubs PII and internal identifiers before requests reach external models. Tens of thousands of agent executions run through the platform each week. On the technical side, context bloat, which is the problem of MCP tool definitions consuming too much of the model's context window, is being solved client side rather than as a protocol deficiency. Claude Code now uses progressive tool discovery and automatically defers tools when their descriptions would consume more than ten percent of the context window. Anthropic has published benchmarks showing roughly eighty five percent reductions in token usage. The protocol continues to evolve. There is work on a tasks primitive for long running agentic communication and a working group is actively developing triggers, essentially webhooks that would let servers proactively notify clients of new data.
From Google Cloud, a piece on Mastering Gemini CLI Subagents. This one is for the developers who want to move beyond the simple chat paradigm. The core idea is context management. In a standard LLM session, every file read, every command run, and every clarification gets appended to a linear history. Ten minutes later, the meeting summary you asked for is cluttered with code snippets and logs from a debugging task you ran earlier. That is context rot. It leads to high latency as the model re reads unrelated data, attention drift as the model's focus gets spread thin across unrelated topics, and token waste as you pay for the same debugging logs over and over. Gemini CLI introduces a hub and spoke architecture. Your main session is the hub or manager and your subagents are the spokes or specialists. When the manager calls a subagent, that subagent starts with a clean context window. It only receives the specific task and relevant instructions. Once it finishes, it returns a concise summary to the manager and the intermediate tool calls are purged from the history. Subagents are defined in a Markdown file with YAML frontmatter. You specify the name, description, and the specific tools the subagent is allowed to use. The tooling constraint is important because it lets you enforce boundaries. A subagent you only want to read files and analyze a project structure can be locked down to just those tools so it cannot accidentally modify anything. The piece walks through building a README architect subagent as a concrete example. This is a useful pattern for anyone building complex AI workflows who wants to keep the manager context clean and lean.
We are drowning out important signals of discomfort. That is the provocative opening of The Hidden Cost of Comfort, a piece on interoception, our ability to read internal signals. The author uses a fascinating example. In the nineteen fifties, over ninety percent of toddlers were potty trained by eighteen months. Today that number is about four percent. The average age of potty training has drifted to nearly thirty seven months. What changed? Disposable diapers. In cloth diapers, the child feels wetness. That sensation creates a feedback loop between the bladder and the brain. The child learns to connect that internal sensation with the need to respond. Disposable diapers eliminate that feedback. The diaper muffles the signal and the child never learns to read it. The same mechanism plays out across modern life in larger ways. We no longer experience boredom. Our phones are always there to fill the gap. But boredom is a signal. It is a prompt to go search for something better to do. It is at the heart of creativity. Playgrounds have been sanitized of risk. But risk calibration is learned through play. When children never feel afraid on a high platform, feel their stomach tighten, and decide whether to climb anyway, they never build that capacity. We outsource navigation to GPS. But research at University College London found that London taxi drivers, who memorized twenty five thousand streets, had measurably larger hippocampi than the general population. We are not building the mental map because the device is doing it for us. The author's point is that every one of these trade-offs follows the same pattern. Remove the signal, lose the capacity it was building. The consequences show up as anxiety, poor performance under pressure, and an inability to read one's own body. The prescription is to deliberately bring some signals back. Run without music sometimes. Stand in line without reaching for the phone. Get a little lost. Have a digital sabbath. These are not nostalgic complaints. They are interoception training.
A piece from Battery Ventures makes the argument that Agent Skills Are the New SDK and that developer infrastructure companies should be building them now. For two decades, the distribution playbook for developer tools was some version of the same thing. Get the SDK installed, land the product in one team, and expand from there. The bottleneck was developer bandwidth, convincing a developer to add a dependency, configure credentials, and instrument the first API call. AI coding agents are removing that bottleneck. Four percent of all public GitHub commits are now authored by AI and that number is climbing fast. When a developer's default workflow is describe what I want and review the code, the AI agent becomes the intermediary. And that intermediary is programmable. Agent skills are small, installable context packages that teach an AI coding agent how your tool works, what patterns to follow, and what mistakes to avoid. One command, and every interaction the agent has with a codebase carries deep, opinionated knowledge of your SDK. The value is that many infrastructure products scale with coverage and coverage has historically been gated by developer memory and discipline. A well designed observability skill changes the default. The agent knows that whenever a new HTTP handler is written, it should instrument it. The developer does not have to remember. The agent does. The proof is already emerging. Neon, the serverless Postgres company, published AI rules, Claude Code plugins, Cursor integrations, and a full agent skills library on GitHub. The result. Over eighty percent of the databases provisioned on Neon were created by AI agents, not humans. That stat was so striking it became a centerpiece of Databricks's rationale for acquiring Neon for one billion dollars. If skills become the dominant adoption surface, winning shifts to whoever gets embedded in the agent context most deeply and most correctly.
Finally, a case study on how Estee Lauder Companies uses Cloud Run worker pools for its pull based agentic workloads. Estee Lauder built a polymorphic chat service called Rostrum for LLM powered applications. It originally ran as a standalone Cloud Run service, which worked fine for internal tools with predictable traffic. But they were launching Jo Malone London's AI Scent Advisor, their first consumer facing generative AI application, and they needed to handle traffic from thousands of simultaneous users during the holiday shopping season. They migrated to a producer consumer model using Cloud Run worker pools. The web tier, a FastAPI application deployed as a Cloud Run service, acts as the producer and instantly publishes user messages to Cloud Pub/Sub. The worker pools act as always on consumers and pull messages from the queue to handle LLM inference. By decoupling the user facing web tier from LLM operations, they achieved one hundred percent message durability, strong UI latency SLAs with server side rendering decoupled from message processing load, and minimal operations overhead. This modular architecture now serves as their blueprint for rapidly launching specialized AI advisors across their entire portfolio of brands.
That is episode 761. I will see you on Monday.