The question defining the next AI investment cycle is not who has the biggest model. It is who can do the most with the least, while delivering high quality services at lower cost.

The AI industry built its first wave of trillion-dollar valuations on a simple premise: bigger models win. That premise is now cracking under the weight of its own costs. Across the sector, leading AI companies are repositioning around a new competitive axis: inference efficiency. The shift is visible in Microsoft's Build 2026 announcements, Google's Gemini Flash pricing strategy, and the enterprise token budget crisis forcing organisations to rethink AI spending. Perplexity CEO Aravind Srinivas put a name to it on June 3, 2026, telling CNBC the winner will deliver the most "value per watt per user." But the race he described was already underway long before he coined the phrase.

A Sector-Wide Pivot Away From Scale

The efficiency turn is not coming from one company. It is arriving simultaneously from multiple directions. On June 2, Microsoft Corporation (NASDAQ:MSFT) announced seven proprietary AI models at its Build 2026 developer conference. The flagship, MAI-Thinking-1, was positioned around low token cost rather than raw capability.

Microsoft told CNBC that after optimising its models for McKinsey deployments, it achieved ten times better cost efficiency than OpenAI's GPT 5-5. Its MAI-Image-2-Efficient model cut costs by 41% while increasing processing speed by 22%, according to company announcements. Microsoft's strategic motivation is also worth noting. By running its own models on Azure infrastructure, it avoids paying third parties like OpenAI, directly improving its margin profile.

Alphabet Inc. (NASDAQ:GOOGL) made the same move from a different angle. Google unveiled Gemini 3.5 Flash at Google I/O 2026, pitching it as a lower-cost, faster inference option for enterprise workloads. Google's CEO noted from the I/O stage that companies are "already blowing through their annual token budgets." Both companies controlling the majority of enterprise AI spending are signaling the same directional shift. That convergence is a more reliable signal than any single CEO's interview.

Why It Matters to Investors

The valuation stakes make this shift consequential. Anthropic Corporation filed confidentially for a U.S. IPO this week, with its private-market valuation reported at approximately $965 billion. OpenAI's valuation has crossed $850 billion, according to CNBC. Both figures were built on a scale-first framework that rewarded capital intensity and model size. If efficiency becomes the dominant benchmark, the premium attached to raw compute spending faces pressure.

According to Ramp enterprise spending data cited by research firm Artefact, the average cost per million AI tokens fell from roughly $10 to $2.50 in a single year. Yet enterprise budgets are not shrinking. Agentic AI workflows consume dramatically more tokens than simple query-and-response models, offsetting per-unit cost reductions with higher overall volume. The companies that solve both sides of that equation simultaneously, lower unit cost and lower total consumption, will hold a structural advantage as AI spending scales further.

Where Perplexity Fits and Where It Does Not

Perplexity's contribution to this debate is real but narrow. The company launched Personal Computer on June 3, a hybrid local-cloud orchestrator that automatically routes AI tasks between a user's device and cloud models in real time. The product is a genuine attempt to reduce per-task compute costs by keeping simpler workloads off expensive cloud infrastructure. Perplexity was last reported at a $20 billion private valuation. Anthropic and OpenAI dwarf it at roughly fifty times that figure. The efficiency argument Srinivas is making is therefore also a survival argument. Perplexity cannot win a capital spending war. Its platform-agnostic orchestration model is its best viable path to staying competitive. Investors should weigh the thesis on its own merits while recognizing that it was not designed as independent market analysis.

Not all investors believe efficiency will become the primary driver of AI valuations. Scale continues to offer meaningful advantages through developer ecosystems, enterprise adoption, and data network effects. OpenAI and Anthropic's valuations suggest capital markets remain willing to reward market leadership and growth potential even as operating costs rise. In that scenario, efficiency becomes an important competitive factor rather than the dominant determinant of value.

The Infrastructure Layer That Benefits Most

The efficiency shift creates a clearer investable angle in the infrastructure layer than in the model layer. Nvidia Corporation (NASDAQ:NVDA) remains central to AI compute, but a sustained push toward efficient local inference broadens the opportunity set. Microsoft's MAI-Transcribe-1 model was reported to offer 50% lower GPU costs than comparable third-party alternatives, according to company statements. Google's proprietary Tensor Processing Units give Alphabet a structural cost advantage that OpenAI and others cannot easily replicate, according to analysis published by Investing.com.

Chipmakers and networking providers that reduce inference latency and power consumption stand to benefit as enterprises shift from experimental AI spending to optimised production deployments. The longer this efficiency competition runs, the more valuable purpose-built inference hardware becomes relative to general-purpose compute.

Bottom Line

The AI efficiency race is becoming an increasingly important strategic focus across the sector's largest players. For retail investors, the direct question is not which company coined the best efficiency metric. It is which public companies hold a structural cost advantage as AI deployment scales. Microsoft's proprietary model stack, Google's TPU infrastructure, and Nvidia's next-generation inference platforms are the clearest near-term expressions of that advantage. Perplexity remains private and unavailable to most retail investors. But the framework its CEO articulated on June 3 is a useful filter for evaluating public AI stocks as Anthropic's potential trillion-dollar IPO approaches and the efficiency debate moves from conference stages to analyst models.

Benzinga Disclaimer: This article is from an unpaid external contributor. It does not represent Benzinga’s reporting and has not been edited for content or accuracy.