Nvidia vs AMD – Apple vs Android?

This week's findings

Jan 11, 2026

∙ Paid

LLM Commoditization vs Differentiation

One of the big debates in AI is whether LLMs will be a commodity, or whether these will be attractive businesses where a handful of big companies will dominate like we have in big tech today. The commoditization camp will point to charts like the one from Nvidia below, pointing out that open source models are only six months behind and with the gap in capabilities continuing to narrow:

A few key points though to note here. Whenever we’ve tried open source models, the output we get is of lower quality than the leading models. What is likely going on in the chart above is that open source models are heavily trained to perform strongly on benchmarks, whereas with leading models there is also a strong focus to perform in the real world. This is the classic statistical phenomenon of in-sample versus out of sample performance. Just because a statistical or quant model performs very strongly on in-sample data doesn’t mean it will perform strongly once you test it on real data it hasn’t seen before. For example, in quant investing, a backtested and data-mined performance shouldn’t get confused as a reliable indicator for real world results.

Despite the chart showing a narrowing performance, our view is that real world performance between leading models and the rest of the followers will continue to widen. Before, training was heavily dominated by pre-training, which is the phase where you train the model on the available body of human knowledge. However, over the last twelve months, both reinforcement learning and chain-of-thought reasoning have become crucial to get state-of-the-art problem solving capabilities. With reinforcement learning, models can now be trained on unlimited data in particular fields such as mathematics and coding as the model’s proposed solution to synthetic problems can objectively be verified.

This is similar to DeepMind’s training process of AlphaGo nearly a decade ago, where the model that beat the world champion in the boardgame was not trained on human data, but purely on synthetic data by having the model continuously play Go against itself. So, we believe that scaling laws in current AI training are alive and well, and that the increasing capital intensity will make it hard for new players to enter the field. You either need large revenues or access to large amounts of funding in order to compete in this game.

Going forward, another differentiator is that leading firms will have the capabilities to build high-value data sets themselves. This includes hiring investment bankers to teach models how to do financial modeling, lawyers to set up research workflows for legal cases, scientists for scientific research, etc. As the model arrives at its final answer via chain-of-thought reasoning, competition aiming to distil a leading model over API can only memorize this final result. However, these distilled models will struggle to resolve new real world problems themselves.

A positive for the big tech firms is that they also have access to unique data sets. Good examples here are Meta with Facebook and Instagram, xAI with X, and Microsoft with LinkedIn. While other providers of high quality data sets such as Reddit will likely continue to steeply increase pricing in return for training access. Amazon is also already blocking access to their shopping site for AI agents, but presumably for a hefty fee this could get unlocked as well.

The key point from the above data points is that capital requirements in order to compete in this race will only continue to grow. Historically, these types of markets have not resulted in highly competitive landscapes with a plethora of competition, but rather a handful of competitors at best. While there is a widespread narrative that LLMs are commodities, we don’t believe this is actually the case today and that it will be even less true in the future. OpenAI and Anthropic scaled to $10-20 billion ARRs within a few years—the fastest ever— providing objective evidence that these businesses have clear differentiated value.

The only counterargument to the above we can see is that big Chinese tech firms can legally purchase high-quality training sets via Chinese data exchanges. For example, while US medical data is fragmented across thousands of private hospitals on different systems, Chinese AI firms have access to large scale anonymized medical records giving an obvious competitive advantage in fields such as medicine and biology.

Another advantage is that China has a ton of engineers which could more cheaply assist models to build workflows. A decade ago, China was famous for data labeling farms, where low-skilled workers tagged traffic photos to train autonomous driving models on. Today, companies like DeepSeek and Baichuan use expert-in-the-loop systems where junior engineers and field specialists write chain-of-thought reasoning paths for models to follow. A big drawback for China is that the country remains heavily short on compute. However, in the long term, it’s likely that China will have the advantage in terms of power generation.