Fluid compute exists for a new class of workloads. I/O bound backends like AI inference, agents, MCP servers, and anything that needs to scale instantly, but often remains idle between operations. These workloads do not follow traditional, quick request-response patterns. They’re long-running, unpredictable, and use cloud resources in new ways.

Fluid quickly became the default compute model on Vercel, helping teams cut costs by up to 85% through optimizations like in-function concurrency.

Today, we’re taking the efficiency and cost savings further with a new pricing model: you pay CPU rates only when your code is actively using CPU.

Read more

link to the original content