AI's Growing Cost Problem: How Infrastructure Economics Will Shape the Market's Next Phase

Artificial intelligence is scaling rapidly–but its underlying economics are far less stable than they appear. Today, the majority of global AI workloads run on infrastructure controlled by just a handful of cloud providers. According to Holori's 2026 market analysis, AWS holds approximately 33% of the global cloud market, Microsoft Azure about 22%, and Google Cloud around 11%. Together, these three companies account for roughly two thirds of global cloud infrastructure, which supports a substantial share of AI deployment worldwide.
When the OpenAI API experiences downtime, thousands of products are affected. When a major cloud provider suffers an outage, services across industries and regions are disrupted.
This article is not about who will build the most advanced model. It is about a different question: whether the current infrastructure model for AI is economically sustainable at scale, and how changes in the mechanics of compute allocation could reshape value distribution across the market.
The Cost of Intelligence Behind the Scenes
Training frontier models already requires tens or even hundreds of millions of dollars. Anthropic has stated that training Claude 3.5 Sonnet cost 'a few tens of millions', and its CEO, Dario Amodei, has previously projected that next-generation models could approach $1 billion in training costs.
Estimates reported by industry media suggest that training GPT-4 may have exceeded $100 million. However, the less visible but structurally significant cost is inference.
Under publicly available OpenAI pricing, inference is billed per million tokens. For applications with high usage volumes, this can translate into thousands of dollars per day in recurring costs, even before scaling further.
AI is often described as software. Increasingly, its economics resemble capital-intensive infrastructure with ongoing operational expenditure.
Capital Intensity and Market Concentration
Rather than stabilising, infrastructure spending is expanding. Microsoft has indicated that capital expenditures tied largely to AI infrastructure could exceed $60–65 billion in fiscal year 2025, reflecting sustained investment in data centers and advanced GPU clusters.
Meta has similarly projected the same $60–65 billion in capital expenditures for 2025, with a significant share directed toward AI capacity expansion.
Meanwhile, Nvidia's data center revenue has surged beyond an $80 billion annual run rate, underscoring continued demand for high-performance GPUs powering AI systems.
According to SEC filings and market reports, major labs like OpenAI and Anthropic leverage multi-billion dollar equity-for-compute deals to secure GPUs at near-cost rates as low as $1.30–$1.90 per hour. Smaller firms, lacking these strategic partnerships with Nvidia, Microsoft, or Amazon, are forced into retail prices exceeding $14.00 per hour, a 600% markup that structurally ties their operating margins to the pricing policies of a few dominant providers. This pricing gap, fueled by Nvidia's recent $40 billion in combined investments into these labs, illustrates how access to AI infrastructure is increasingly dictated by capital-heavy procurement agreements rather than open market competition.
At early stages of adoption, such concentration can appear efficient. At scale, it introduces exposure–to pricing shifts, supply constraints, and infrastructure dependency.
The Energy Dimension
AI infrastructure also has an energy footprint. According to the International Energy Agency (IEA), data centers currently account for approximately 1–1.5% of global electricity consumption, and AI-driven demand could significantly increase that share in the coming years.
Compute economics is therefore not only a financial issue but an infrastructure and energy challenge as well.
Rethinking Infrastructure Mechanics
Alongside the expansion of centralised data centers, alternative approaches are emerging that seek to rethink how compute resources are coordinated.
One such approach is implemented in the Gonka protocol, a decentralised network for AI inference designed to minimise network synchronisation and consensus overhead and direct computational power toward productive AI workloads.
Governance weights are determined by verified computational contributions, effectively applying a 'one compute unit, one vote' principle. Rather than allocating the majority of compute to highly resource-intensive or capital-concentrating coordination models, the protocol uses short performance measurement intervals, referred to as Sprints, for participants to demonstrate real GPU capacity through transformer-oriented tasks.
Outside of these intervals, the network's resources are dedicated to AI inference.
This design is an intentional economic move to ensure the majority of computational power is optimally aligned with productive AI inference workloads, which ultimately allows to reduce the structural cost of AI deployment.
Architectural Competition and Strategic Flexibility
Decentralised models are not positioned as replacements for centralised cloud providers. Instead, they introduce an alternative coordination mechanism.
Even partial diversification of GPU sources can reduce vendor lock-in risk and influence market dynamics. If independent operators can aggregate resources through open protocols and receive rewards for verifiable compute contribution, a new layer of infrastructure competition begins to form.
As decentralised inference scales, the pursuit of the lowest cost-per-token is driving a shift toward hardware specialisation. Unlike hyperscalers, whose procurement models prioritise fleet flexibility over aggressive efficiency, decentralised networks incentivise the adoption of specialised, transformer-only silicon. Because decentralised participants are driven purely by yield and are unburdened by the need to support diverse cloud workloads and long-term lock-in contracts, they become the natural first adopters, opening the window of opportunity to achieve an efficiency-per-watt that general-purpose data centers cannot match.
As AI becomes embedded in finance, logistics, healthcare, and public systems, the structure of the compute layer becomes a long-term economic architecture decision.
The Structural Question of AI's Next Phase
AI capabilities will continue to advance. But the organisation of the compute market — whether concentrated or increasingly distributed — will determine how economic value is ultimately allocated.
If inference remains structurally expensive and centralised, a disproportionate share of value may accrue at the infrastructure layer. If compute coordination becomes more competitive and capital-efficient, AI deployment could scale with greater predictability.
In that sense, the next phase of AI may be shaped less by the size of models and more by the architecture of the networks that power them.
Anastasia Matveeva is a Senior Product Manager and researcher at Product Science and a co-creator of the Gonka protocol. Her work focuses on machine learning infrastructure, large language model inference, and distributed computing systems.
She holds a PhD in Mathematics from UPC Barcelona, where she worked as a researcher and lecturer. Since joining Product Science in 2021, she has led the development of AI engineering tools adopted by more than 100 engineers and used across multiple Fortune 500 companies.
© Copyright IBTimes 2025. All rights reserved.





















