Google Abandoned the Bigger Model Race for Speed

Every AI company has spent the last two years chasing parameter counts and context windows. Google spent yesterday telling developers that speed matters more.

At Google I/O 2026, the company released Gemini 3.5 Flash as a generally available model today. Not next month. Not in a limited beta. Today. And the pitch was not about how smart it is, but how fast it moves through agentic workflows where a model makes hundreds of sequential decisions in a single task.

The numbers are worth looking at.

Gemini 3.5 Flash: Built for Agents, Not Benchmarks

Gemini 3.5 Flash ships with a 1 million token context window, 65,000 token max output, and four thinking levels that range from minimal to high. The thinking levels are persistent across conversation turns, which means the model remembers its reasoning path instead of starting fresh each time.

Google claims 4x faster performance than comparable frontier models in coding tasks, and up to 12x faster when running inside its Antigravity 2.0 agent stack. Those 12x numbers come from Antigravity, not from the raw model, which is an important distinction. Antigravity is Google's execution substrate, the plumbing that connects model calls to Linux sandboxes, tool use, and artifact generation.

The pricing is aggressive at first glance: $1.50 per million input tokens and $9.00 per million output tokens. Cached inputs get a 90 percent discount. But Artificial Analysis flagged that 3.5 Flash costs 5.5 times more than the older Gemini 3 Flash and 75 percent more than Gemini 3.1 Pro. You are paying for speed, not efficiency.

On benchmarks, 3.5 Flash lands at number 9 on Text Code Arena with 84 percent on MMMU-Pro. That puts it behind the current generation of flagship models from OpenAI and Anthropic. Google's argument is that this does not matter for agentic workloads, where raw accuracy per call is less important than the total number of calls you can make per second.

I am not convinced that tradeoff holds for every use case. If your agent is making legal or medical decisions, you probably want the model that gets it right on the first call, not the one that can retry twelve times. But for code generation, data extraction, and iterative refinement tasks, the speed advantage is real.

Gemini Omni: Video Generation and Physical Priors

Google also announced Gemini Omni, a multimodal model that processes text, images, audio, and video inputs to produce video edits and generated content. It is available now in the Gemini app and will reach YouTube Shorts and Create later this week.

The interesting part is not the video generation itself. Runway, Pika, and Luma have all been doing this for months. The interesting part is that Google is building physical priors into the model, things like gravity, kinetic energy, and basic object permanence. Earlier generative video models treated physics as an afterthought, which is why water flows upward and objects clip through surfaces. Omni is designed with world model assumptions baked in.

Whether this actually works in practice is an open question. Google showed demos. Demos are not benchmarks. But if Omni can generate physically plausible video without expensive post-processing, that is a genuine differentiator.

Antigravity 2.0 and the Always-On Agent

The most consequential announcement was not a model at all. It was Antigravity 2.0, Google's agent platform, which now includes a feature called Gemini Spark. Spark runs background agents on dedicated cloud virtual machines that persist even when your device is offline. The idea is that your AI assistant can work on long-running multi-step tasks without you keeping a browser tab open.

Google demonstrated this by building a functioning operating system in 12 hours using 93 parallel sub-agents, 15,000 API requests, and less than $1,000 in credits. That demo is impressive but also raises questions about how much of that work was the model doing versus the scaffold and tooling around it. The distinction matters because it determines whether this is a general capability or a carefully engineered demo.

Google also rebranded Search as AI Search, introduced Ask YouTube for querying video content, and launched Docs Live where you can describe what you want and Gemini builds the document. The new AI Ultra plan costs $100 per month and the top tier dropped from $250 to $200.

Community Reaction

The response from developers was mixed. On Reddit's r/GeminiAI and r/LocalLLaMA, several users praised the 3.5 Flash general availability, noting that staged rollouts have been a recurring frustration with Google model releases. The immediate availability was seen as a genuine shift.

On Hacker News, the discussion centered on cost. Multiple commenters pointed out that 3.5 Flash is significantly more expensive than previous Flash variants, and questioned whether the speed gains justify the price increase for workloads that are not latency-sensitive. The product sprawl concern also came up: Google now has Gemini CLI, Antigravity CLI, Gemini Spark, and Gemini Omni, and the naming conventions overlap enough that even experienced developers find it confusing.

A common theme was that Google is moving fast on integration but slower on clarity. The technology is impressive. The story around it is not.

What Surprised Me

The thing I keep coming back to is the pricing strategy. Google is charging more for a model that scores lower on benchmarks than competitors, and betting that developers will pay for speed over raw accuracy. That is either a bold read on where agentic AI is heading, or a sign that Google needs to monetize its compute infrastructure aggressively. Probably both.

The physical priors angle in Omni is genuinely interesting if it ships as described. But the Antigravity demo with 93 sub-agents building an OS in 12 hours feels like a proof of concept that will take months to generalize. Most developers do not have the tooling scaffolding to run parallel sub-agents effectively. Google is selling the destination without showing the road.

Google processed 3.2 quadrillion tokens per month last quarter, up 7x year over year. The Gemini app has 900 million monthly users. The company has the infrastructure and the user base to make agentic AI mainstream. Whether the current generation of models and tools is actually ready for that scale remains to be seen.

Gemini 3.5 Flash: Built for Agents, Not Benchmarks

Gemini Omni: Video Generation and Physical Priors

Antigravity 2.0 and the Always-On Agent

Community Reaction

What Surprised Me

RELATED_ENTRIES

That 27B model was too big for a phone. Not anymore.

$4.40 per million tokens just matched the $200 tier

AI coding costs hit $2,000 per engineer and budgets blew up