Photo by Steve A Johnson on Unsplash
Last month, I watched an AI researcher spend three hours training a neural network to recognize cats. The kicker? The exact same model had been trained and published by a different team just two weeks earlier. When I asked why they didn't just use the existing one, she laughed. "Because we need to know if it works the way we think it works."
That conversation stuck with me because it reveals something counterintuitive about modern AI development: redundancy isn't a bug—it's often a necessary feature of progress. Yet most people don't realize how much repetition happens behind the scenes, or why it matters.
The Surprising Economics of Retraining Everything
Consider what happened with large language models between 2022 and 2024. OpenAI released GPT-4. Google launched Gemini. Meta dropped Llama 2. Microsoft partnered with OpenAI on Copilot. Meanwhile, startups like Mistral, Anthropic, and dozens of smaller organizations trained their own foundation models from scratch.
From a resource perspective, this looks absurdly wasteful. These companies collectively spent billions of dollars on computing power to solve essentially the same problem: create a general-purpose language model that understands and generates human text. They're not just duplicating effort—they're competing to do it in slightly different ways.
But here's what that redundancy actually buys us. When multiple independent teams reach similar conclusions about how to build these systems, we gain confidence that those conclusions reflect something fundamental rather than a lucky accident. It's the scientific method at scale. Replication is how we separate signal from noise.
Take tokenization, the process of breaking text into chunks that models can process. Every major language model uses a different tokenization strategy. Some use 50,000 tokens, others use 128,000. On the surface, this seems like wasteful variation. But this experimentation revealed that tokenization choices profoundly affect model efficiency, multilingual capability, and even reasoning ability. We only discovered that because teams kept retrying the problem with different approaches.
When Reinvention Catches Real Problems
The redundancy becomes even more valuable when different teams discover different failure modes. This is where things get genuinely interesting—and where it directly intersects with concerns about AI hallucinations that have real-world consequences.
When OpenAI trained GPT-4, they discovered certain vulnerabilities in how their system generates confident-sounding but completely fabricated information. When Anthropic trained Claude, they hit the problem from a different angle and found distinct failure patterns. Neither team could have anticipated the other's findings because the architecture, training data, and fine-tuning strategies were all different enough to produce unique blind spots.
A smaller model trained by an academic team in Singapore might fail in ways that the massive commercial models don't, revealing something about how scale affects reliability. A specialized model built for medical reasoning might expose flaws in how language models handle uncertainty—flaws that a general-purpose model would hide because it's not being stress-tested in quite the same way.
This is why funding diverse AI research matters. Homogeneity in approaches creates homogeneity in failure modes. We'd only discover our collective blind spots if people kept building different systems.
The Cost of Convergence
Here's what worries many researchers though: the industry is consolidating. Training a state-of-the-art language model now requires hundreds of millions of dollars. That's accessible to maybe a dozen organizations globally. This economic barrier means fewer independent attempts, fewer different approaches, and fewer opportunities to discover whether our current methods are fundamentally sound or just happen to work well right now.
We might be entering an era where reinvention becomes the privilege of the extremely well-funded. Smaller teams and startups can fine-tune existing models or train specialized systems on smaller scales, but they increasingly cannot affordably train entirely new foundation models from scratch. That's a loss.
The breakthrough that reveals current large language models have a fundamental limitation? It might never happen, not because the limitation doesn't exist, but because no one had the resources and freedom to pursue a sufficiently different architectural approach.
The Beauty of Productive Redundancy
Yet there's a counterargument worth taking seriously. Maybe the current convergence around transformer architectures and large-scale pretraining isn't a limitation—maybe it's because these approaches are genuinely close to optimal for the current problem domain. That doesn't mean we should stop exploring alternatives, but it does mean we shouldn't romanticize reinvention for its own sake.
The real insight is more nuanced: redundancy is valuable, but only when it's strategic. Training the exact same model with identical methods and data is wasteful. Training different models with different architectures, data sources, or training procedures is investment in understanding.
When that cat researcher spent three hours training her model, she wasn't wasting resources. She was adding a data point to humanity's collective understanding of how neural networks learn visual recognition. That matters because the 100th independent confirmation that something works becomes evidence we can build bigger, more complex systems on top of it with some confidence.
As AI systems become more powerful and more integrated into critical decisions, that confidence matters more than efficiency metrics. And sometimes, the most efficient way forward is to keep reinventing the wheel—just carefully, and with purpose.

Comments (0)
No comments yet. Be the first to share your thoughts!
Sign in to join the conversation.