Seroter's Daily Reading — #776 (May 4, 2026) — Seroter's Daily Reading

Listen: https://blossom.nostr.xyz/8f8ee4eb52ceda29d8cc66e749a321bbf34356b43afe2674854a6a0e8230bc73.mpga

Seroter's Daily Reading, Episode 776. May 4, 2026.

Let's ease into this one. Richard Seroter kicks off with some dad duty — kid has a cold, so no Mountain View trip this week. Good reminder that the rest of us also have lives outside this newsletter. On to the links.

First up, a piece from The New Stack on Cursor's strategy. The headline is blunt: the sixty billion dollar bet is on the harness, not the model. The piece is Cursor's $60 billion bet is on the harness, not the model. This is the year of the harness — that orchestration and judgment layer where the real differentiation lives. We've been talking about this for a while, but seeing Cursor double down on it at that valuation is a strong signal. The model is commoditizing faster than anyone expected, and the tooling around it is where the money flows. Worth reading if you want to understand where the AI development tooling market is actually heading.

Then we have a piece that's near and dear to my heart — thirteen CTOs walked into a dinner, and what they figured out is that there is no universal playbook for AI adoption. That's from Shift Magazine, with a piece titled 13 CTOs walk into a bar and realize: There is no best AI adoption strategy. The conversation happened in London, hosted by Infobip, and the takeaway is refreshingly honest. Some people love working with AI, finally getting to build things themselves after years of just managing. Others are facing what the article calls AI shaming — the skepticism, the eye rolls when someone reaches for an AI tool. The dinner attendees talked about the gap between how people feel and what the DORA metrics actually show. Over at Infobip, eighty percent of the company uses AI tools daily, but when you look at actual delivery outcomes, the correlation is murky. The real insight here is that the conversation has shifted from how much AI are you using to how well are you using it. Quantity was the question in 2024. Quality is the question now.

From Google Cloud, a piece on running multiple coding agents safely using git worktrees. The piece is Run multiple coding agents safely with git worktrees. This is a technical one but the timing is right. When one person is coordinating several agents working on the same codebase, you need isolation. Worktrees give you that. Instead of juggling branches or praying that agents don't overwrite each other, each one gets its own working tree. Clean, simple, and it maps well to how agentic workflows actually need to operate. Worth bookmarking if you're running multi-agent setups.

Then DevOps.com with a provocative headline — documentation is dead, long live documentation. The piece is Documentation is Dead. Long Live Documentation. The full article didn't come through, but the tag line tells us enough. This isn't about eliminating documentation. It's about the idea that documentation should be a side effect of the work, not a separate activity you schedule at the end of a sprint. The shift from "we must document" to "our code and tests document themselves" is a healthy one, and it's what mature engineering cultures gravitate toward anyway.

Moving to Google Cloud's database blog, Firestore is getting some serious upgrades at Next '26. The piece is Firestore at Next '26: Unlock agentic development, search and MongoDB compatibility. The pitch here is that Firestore is underrated and only getting better. What changed? Tighter integration with AI Studio, full-text search so agents can actually find things in your data, and better MongoDB compatibility for migrations. The article also cites FlutterFlow, which went from zero to three million users on Firestore with zero outages and hundreds of billions of reads. That's not a small anecdote — that's a credibility play. Firestore's serverless architecture, sub-second provisioning, and the document model all align well with how agentic applications iterate. Worth revisiting if you've dismissed it as "just for mobile apps."

Speaking of mobile, there's a piece on why startups are choosing Flutter over native in 2026. The piece is Why Startups Are Choosing Flutter Over Native in 2026: A CTO's Perspective. The headline is exactly what you think, but the CTO perspective is worth the read. The economics have flipped. Five years ago you needed a strong reason to pick Flutter. Today you need a strong reason to pick native. The talent pool has matured, the Dart ecosystem is healthy, and the single codebase advantage extends to web and desktop now. That said, the article is honest — bleeding edge platform features, hardware-intensive apps, and maximum performance still belong in native. It's the right kind of pragmatic framing. For most startups, Flutter is the right default choice.

From TechCrunch, a list of twenty-one European startups worth watching, beyond the obvious names like Lovable and Mistral. The piece is Beyond Lovable and Mistral: 21 European startups to watch. This is a long one with a lot of variety. You've got defense tech like Alta Ares doing counter-drone systems, fintech from Apron and Pennylane, renewable energy management from Flower, and some genuinely wild stuff like Proxima Fusion working on nuclear fusion and Space Forge manufacturing semiconductors in orbit. There's Botify helping brands navigate AI search optimization, Cala building knowledge graphs for AI agents, and Gradium building voice models as an ElevenLabs challenger. The methodology is smart — investors were asked to recommend two startups each, one from their portfolio and one outside it. That surfaces names you wouldn't get from a pure traction play. If you want to know where European deep tech talent is concentrating, this is the list.

Back to engineering practice now — a piece from Code Craft Diary on trunk-based development, specifically why your pull requests are still too big. The piece is Trunk-Based Development: Your Pull Requests Are Still Too Big. The opening anecdote is a six-month pull request. Yes, six months. The argument is that large PRs don't just slow things down — they break the delivery system. Reviewers skim, feedback gets vague, and merge risk goes up because the branch is so far from main that conflicts are guaranteed. The real issue is behavioral, not technical. Developers want to ship complete features, they fear breaking things, and they batch work waiting for perfect. The fix is practical: enforce a size limit around three to four hundred lines, use feature flags to merge incomplete work safely, slice vertically instead of horizontally, and track actual PR lifetime. Small pull requests aren't a technical practice. They're a discipline. Everything else follows from them.

Then there's a piece from Internals on what you're actually writing when you write a SKILL.md. The piece is What you're actually writing when you write a SKILL.md. This is a conceptual one that I think a lot of people in this space need to hear. A skill is not a prompt. It's a loader specification. The three levels of execution — metadata, body, and references — load at different times and at different costs. The author shares a story of cutting a skill from twelve hundred lines down to one hundred and eighty by splitting it into a spine plus references, and the context cost dropped from twenty percent to seven. Same instructions, same output, different architecture. The deeper point is that the agent has reasonable priors but your environment isn't average. That's what the Gotchas section is for. And there's a warning about model upgrades — a skill tuned on one model might regress on a better one because the more capable model interprets your instructions instead of following them literally. You need evals. You need paired runs. "It worked when I tested it" is not evidence.

Finally, from the Google Developers blog, some research from UCSD on supercharging LLM inference on TPUs using diffusion-style speculative decoding. The piece is Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding. The headline number is three times speedup on TPU v5p, with peak gains approaching six times on complex math tasks. The key insight is that traditional speculative decoding drafts tokens sequentially — O(K) steps for K tokens — which is itself a bottleneck. Diffusion-style drafting instead paints an entire block in a single forward pass — O(1). That's a fundamental architectural shift. They also found something interesting they call K-Flat verification: on high-end hardware like TPU v5p, verifying 1024 tokens costs almost the same as verifying 16. The bottleneck isn't verification cost, it's draft quality. Math and coding tasks showed the best gains because those domains have predictable token sequences. The work is open source in the vLLM TPU inference repo. If you're running inference on TPUs, this is worth a close look.

That's episode 776. Themes this week: the AI tooling market is separating into models and harnesses, AI adoption is entering its quality phase, and the infrastructure underneath everything — databases, hardware, development workflows — is still where the real engineering happens. Catch you next time.