Cheaper AI Models: Why 80% of Workloads Won't Need Frontier

The AI industry was built on a simple assumption: bigger models are more powerful, and the most powerful models win. But mounting costs are pressuring users to give smaller, cheaper models a serious second look — and the implications could be seismic for an industry that has competed almost exclusively on quality.

Coinbase co-founder Brian Armstrong laid out the prediction most clearly: "Demand for intelligence is near infinite, but 80% of workloads will be running on 99% cheaper models within 12-18 months. 20% of workloads will still run on latest gen models where IQ maxing is important." If this comes true, it would fundamentally reshape the economics of AI — and much of the savings would come at the expense of OpenAI and Anthropic just as they're heading toward their IPOs.

I've been tracking AI funding rounds for years, and I can tell you this: the valuation narratives that got OpenAI to $300 billion and Anthropic to $65 billion were built on the idea that frontier capability was the moat. If most workloads migrate to cheaper models, those valuations need a serious re-examination. The moat shrinks dramatically when the product becomes commoditized.

The Harvey Test: 3x Cost Reduction Without Quality Loss

Initial tests suggest that when systems are arranged correctly, cheaper models can substitute without any sacrifice in quality. The most compelling example comes from legal AI tool Harvey, which partnered with inference platform Fireworks AI to reduce inference costs by 3x without reducing quality.

The test combined Claude Opus and Fireworks' GLM 5.1, shifting to Opus only for the most intensive tasks. The result was a significantly lower load in terms of server time and overall cost.

"Quality comes first, and in legal it always will," said Harvey co-founder Gabe Pereyra. "However, the definition of quality is evolving from simply using the most powerful model for everything, to using the best model that gets the right answer most efficiently."

Here's what makes this test interesting from a VC perspective: Harvey didn't just swap models and hope for the best. They built an orchestration layer that routes tasks by complexity — Opus for the hard stuff, GLM 5.1 for everything else. That's a systems problem, not just a model problem. And it tells you something important about where the real value is accumulating: in the routing logic, not in the base model.

Think about what this means for unit economics. If you're running a legal AI product and your inference costs drop by 67%, your gross margins improve dramatically. That's the kind of margin expansion that makes a Series B look a lot more attractive to investors. Harvey's co-founders are ex-Google and ex-Meta — people who understand that distribution and routing matter as much as the underlying model.

The Harvey Test: 3x Cost Reduction Without Quality Loss

The Real Divide: Large vs. Small, Not Proprietary vs. Open

This trend is often framed in terms of major labs versus Chinese models or open-weight ones, but that misses the bigger point. The real divide isn't between proprietary and open models — it's between large models and small ones.

You can save money by switching from GPT-5.5 to DeepSeek's V4 Flash, but switching to GPT-5.4-mini works just as well. The active price war is between in-house inference from the big labs and independently served open-weight models, but for the larger question of small versus large, it doesn't really matter which kind of small model wins out.

This distinction matters because a lot of venture capital is being deployed based on the wrong thesis. Investors are betting on open-weight models as a category play, when really what they should be betting on is the routing and orchestration layer that sits between users and models. The model itself becomes a commodity — it's the system design that captures value.

I've seen too many startups raise on the premise that they'll build the "better open model." But the Harvey test suggests the winning play is different: build the intelligence that decides which model to call, when to call it, and how to combine outputs. That's where the defensible IP lives. The model is just a utility at that point.

The Real Divide: Large vs. Small, Not Proprietary vs. Open

Running Counter to the Scaling-First Approach

All of this might seem obvious — of course you shouldn't use more compute than necessary — but it runs counter to the scaling-first approach that has dominated the industry until now. Inspired by "the bitter lesson," labs have leaned hard into training the most compute-intensive models possible, pushing the frontier of what AI models can do.

With prices heavily subsidized by investors, clients had no reason to choose anything but the most advanced option. But with token prices rising and subsidies slowing down, users are facing cost pressure for the first time.

Let's be honest about what happened here: venture capital created a distortion. When your customers aren't paying for inference out of their own pockets — when someone else is footing the bill in exchange for equity — you use as much compute as you want. It's rational behavior, but it's also a recipe for building an industry that can't survive without constant fundraising.

The bitter lesson in machine learning is that general methods that exploit computation tend to outperform domain-specific methods. But there's a corollary that nobody talks about: when computation becomes expensive, the optimal strategy flips. Now the smart move is to use as little compute as possible while still getting the job done.

This is going to create serious tension at the frontier labs. OpenAI and Anthropic have spent billions building models that are, by design, overpowered for most tasks. Their entire R&D trajectory is optimized for capability, not efficiency. If 80% of workloads migrate to cheaper models, those labs need to figure out how to remain relevant without their flagship products being the default choice. See our analysis of how Anthropic and OpenAI are positioning for their public debuts — the IPO timing problem may be more acute than their current narratives suggest.

The IPO Timing Problem

The timing of all this is particularly awkward for OpenAI and Anthropic. Both companies are heading toward IPOs, and their valuation narratives depend on continued demand for frontier models. If the market starts pricing in a future where most workloads run on 99% cheaper alternatives, those IPOs look very different.

Armstrong's prediction isn't just about model choice — it's about who captures value in the AI stack. If the routing layer becomes the valuable piece, then the companies building inference platforms and orchestration tools capture more of the pie than the model labs. That's a fundamental shift in how we think about AI investing.

Consider the math: if OpenAI's flagship model costs $10 per million tokens and a smaller alternative costs $0.10 per million tokens, that's a 100x price difference. Even if the smaller model is slightly less capable, the economics drive adoption hard. And once that migration starts, it's self-reinforcing — cheaper models attract more users, which attracts more investment in smaller model research, which improves their quality further.

For venture investors, this changes the entire investment thesis. The question is no longer "which lab will win?" It's "who controls the routing layer?" And that's a much smaller, more fragmented market than the frontier model race.

The Uncertain Path Forward

We don't know whether the new cost pressure will actually drive enterprise users to smaller models. They could just as easily economize by making fewer calls, using less context, or simply giving up on the least promising deployments.

But if it turns out that most deployments can be run just as well on a smaller model, it could put a serious damper on the growing demand for inference — and raise new questions about how to justify the cost of training a frontier model. The industry is about to learn what happens if the assumption that bigger is always better starts to break.

What I'm watching closely is whether enterprise buyers treat this as a temporary cost-cutting measure or a permanent shift in how they think about AI. The Harvey test suggests it's the latter — they're not just using cheaper models because they have to, but because they've redesigned their systems around the idea that model choice should be dynamic.

The companies that figure this out first will have a serious competitive advantage. Not because their models are better, but because their systems are smarter about which model to use when. That's the play I'd be making as a venture investor right now: back the orchestration layer, not the frontier model.

The Cheaper Models Shift: Why 80% of AI Workloads May Never Need Frontier Models Again

The Harvey Test: 3x Cost Reduction Without Quality Loss

The Real Divide: Large vs. Small, Not Proprietary vs. Open

Running Counter to the Scaling-First Approach

The IPO Timing Problem

The Uncertain Path Forward

Related blogs

OpenAI's IPO Ambitions Clashing with Surging Operational Costs

The Last $40 Million: How H1 Built a SaaS Business AI Can't Replace

Mach Industries Just Broke Defense. Here’s How.