Photo by Alexandre Debiève on Unsplash

Last year, everyone celebrated the end of the GPU shortage. Nvidia's stock soared. Think pieces proclaimed that the chip crisis was finally behind us. Data centers filled their racks. The narrative felt complete, satisfying—a technological problem solved through capitalist efficiency.

None of that was entirely wrong. But it also wasn't entirely true.

The shortage didn't disappear. It transformed. What we're experiencing now is something more nuanced and frankly more dangerous: a fragmented, invisible crisis where the rich get richer and everyone else gets locked out. The headlines stopped because the pain became concentrated, hidden behind closed corporate doors and exclusive cloud provider agreements.

How We Got Here: The AI Gold Rush Meets Physical Reality

When ChatGPT exploded onto the scene in November 2022, nobody was prepared for what would happen next. Suddenly, every company with a pulse wanted to build an AI product. Startups raised billions. Established tech giants panicked and rushed into the market. Everyone needed GPUs—specifically Nvidia's H100s and A100s, the gold standard for training and running large language models.

The problem? Physical chips take time to manufacture. Nvidia couldn't keep up with demand. Neither could AMD. Neither could anyone else. The waiting lists stretched for months. Some companies reported 18-month backlogs. By mid-2023, the shortage was the dominant tech story of the year.

But here's what happened next: the biggest players—the ones with the deepest pockets—decided to stop waiting. Microsoft pumped billions into its own data center infrastructure and struck exclusive deals with Nvidia. Google doubled down on its custom TPU chips. Meta built out massive compute capacity. Amazon AWS, Azure, Google Cloud—they all secured massive allocations, sometimes through partnerships that made them Nvidia's unofficial preferred customers.

The shortage "ended" because the scarcity was absorbed by the companies that could afford to absorb it.

The New Reality: Compute Access as a Moat

What we're seeing now is something most analysts missed: GPU availability has become a competitive advantage so significant that it's reshaping the entire industry structure. If you can afford to build your own data centers and negotiate with Nvidia directly, you win. If you can't, you're paying premium prices on public cloud platforms or waiting in line like everyone else.

Consider the numbers. A single H100 GPU costs roughly $40,000. A well-trained large language model requires thousands of them. That's a capital expenditure of tens of millions of dollars, minimum. OpenAI spent an estimated $100+ million to train GPT-3. Training GPT-4 likely cost several times that. Google's Gemini? Probably in the same ballpark. These aren't just expensive ventures—they're capital-intensive moats that only the most well-funded organizations can build.

For mid-size companies and startups, the situation is actually worse now than during the original shortage. Back then, scarcity was universal. Now? They can buy GPUs, technically. But they're buying from Azure or AWS at premium prices, or they're getting access to older generation chips at new generation prices. A startup burning cash can't outbid OpenAI for Nvidia's newest allocations. It just can't.

This explains why so many "AI startups" have shifted their strategy entirely. Instead of trying to build foundation models from scratch, they're building on top of other people's models. They're using OpenAI's API. They're fine-tuning Meta's Llama. They're prompting GPT-4 through a web interface. It's not a creative choice—it's a GPU shortage tax.

Why You Haven't Heard About This Crisis

The old shortage was democratic in its pain. Every company felt it equally. That made it newsworthy. When Reuters reports that a startup can't get GPUs, it's a sympathetic story. When Reuters reports that a startup is paying premium cloud prices instead of buying hardware, that's not a shortage—that's just business.

The media narrative shifted because the affected parties changed. During 2023, we heard endless stories from mid-market companies struggling to compete. Now those companies have largely accepted their position in the ecosystem. The ones that couldn't adapt to being API-dependent businesses have already failed or pivoted. The ones that survived are now the ones building on top of larger models, and they're generally not complaining publicly.

Meanwhile, the companies that are still actively building foundation models—OpenAI, Google, Meta, Microsoft, Anthropic—they're not going to complain about GPU availability. They secured it. They own the advantage. Talking about it would only draw regulation, invite antitrust scrutiny, and encourage competitors to try harder to solve the problem themselves.

The shortage disappeared from headlines not because it was solved, but because the crisis found equilibrium. An uncomfortable equilibrium, but an equilibrium nonetheless.

The Real Cost: What We're Not Building

Here's what keeps me up at night about this situation: we're probably not building the next OpenAI. We're probably not building the next Google from scratch. The barrier to entry for fundamental AI research has become so prohibitively expensive that it has essentially closed.

This matters because innovation often comes from unexpected places. Google emerged when search seemed solved. Facebook when social networks seemed settled. Instagram when photo sharing had been done. Every single one of these breakthroughs required a new team with a new vision getting access to the necessary infrastructure to compete.

That's not really possible anymore with AI. The infrastructure costs are too high. The GPU allocations are locked down. The venture capital funding, while abundant, flows toward teams that can credibly claim they'll have access to compute. And who can make that claim? The ones connected to the big tech companies, or the ones with enough capital to build their own infrastructure, or the ones who already have massive balance sheets.

If you're curious about how AI systems are actually making decisions beneath the surface, and what happens when the infrastructure constraints change, you should read about how AI is finally learning to understand the silence between your words. The technical sophistication is incredible. But it's being developed by an ever-smaller circle of organizations.

What Happens Next

The GPU shortage isn't going away. Nvidia is increasing production, sure. But demand for AI compute is also increasing faster than production capacity. Every new model requires more compute than the last. Every company wants to fine-tune models for their own use cases. The gap isn't closing.

What we'll see instead is continued stratification. The rich will get richer. The well-connected will stay well-connected. Alternative approaches—smaller models, edge computing, different architectures—will get more attention from researchers who can't access unlimited compute. Some of that will lead to interesting innovations. Some will just be workarounds that wouldn't have been necessary if compute was abundant.

The invisible shortage will eventually become visible again. Not when the constraint is resolved, but when it becomes so clearly unfair that regulation finally enters the conversation. And by then, the winners will already be locked in.