top of page

Get off the AI cloud?


Stanford researchers dropped a research paper in mid-November that challenges the cloud-centric delivery model of AI inference due to good enough price performance versus multi-year construction costs. Here's my bottom line:

The "Intelligence Per Watt" framework shows that appropriate AI models running on consumer devices are becoming viable for the majority of everyday queries.

The paper indicates that consumer-grade tech such as the M4 Max is becoming viable for the vast majority of everyday queries ... and is improving at an impressive rate. Meanwhile, their TL;DR shows highlights the challenge facing hyperscalers: the laws of physics and construction.


"AI demand is growing exponentially, creating unprecedented pressure on data center infrastructure. While data centers dominate AI workloads due to superior compute density and efficiency, they face scaling constraints: years-long construction timelines, massive capital requirements, and energy grid limitations".


Although Virtified is a little tight with its capital expenditure in startup mode, experience from the Apple M4 up the back of Virtified Labs shows a real-world example supporting this assertion. We've used ollama quite successfully since July (albeit alongside more than 950,000 automated, multi-model API-initiated prompts run by our junior research assistants. Yes: 950,000+).


The key insight is straightforward but game-changing and raises a big question in my mind:

Does the centralised hyperscale model hit the wall when decentralised compute can meet the majority of AI demands?

Rather than asking whether local models can match frontier AI capabilities head-to-head (spoiler: they can't for everything), the Hazy Labs team ask a different question: can consumer-grade hardware deliver sufficient accuracy within its power constraints?


Their "Intelligence Per Watt" (IPW) metric—accuracy divided by power consumption—tracks exactly this tradeoff, and they claim it's improved by 5.3× over just two years (2023-2025). That's driven by 3.1× gains from smarter model architectures (like Mixture-of-Experts) and 1.7× gains from better hardware (accelerators).


The practical implications are enormous. Testing against 1 million real-world chat and reasoning queries, local models delivered the following:


  • Successfully handled 88.7% of queries/chats

  • Jumping from 23.2% success rate in 2023 to 71.3% in 2025.

  • Handled over 90% of creative and humanities tasks.

  • Handled more than 65% of architecture and engineering tasks.


The paper suggests that intelligent routing of queries might reduce the energy, compute, and cost of inference by 60-80% while maintaining accuracy (quality). It's not binary, of course: anything the local model can't handle automatically gets routed to a frontier model.


Sure, cloud-delivered hyperscale still has efficiency advantages—and armies of engineers delivering superior innovation—but the real story here is about overall system-level wins. You don't need theoretical perfection when you've got the majority of queries running on your local workstation solved quick enough, good enough and cheap enough.


With some boffins—not Virtified—suggesting centralised global AI infrastructure could potentially demand 50-100 GW of power by 2030, the distribution / decentralisation of everyday inference to laptops and phones isn't just clever engineering—it's becoming an essential infrastructure strategy.


PS Bonus round: add the enterprise-grade hardware capabilities to the mix and imagine what might happen.


Footnote: Virtified is not good at analogies, similes or metaphors but we will draw a comparison. In 2015 we wrote a Maverick research paper originally titled "Get off the grid before it goes off". The eventual titled was The Grid Is Dead, Long Live the Grid! We looked at reductions in the levelised cost of energy (LCoE) for a range of energy sources including coal, nuclear, solar and wind. At the time, the missing piece was batteries. Look at what has changed since then: the levelised cost of storage (LCoS) for lithium-ion batteries has reduced by more than 80%.

Related Posts

See All

Comments


bottom of page