The Token Explosion in the Cloud
The demand for tokens in the cloud is absolutely exploding. This is Nadella on Microsoft’s recent call:
“We processed over 100 trillion tokens this quarter, up 5x year-over-year, including a record 50 trillion tokens last month alone.”
This means that 50% of all tokens over the last year were consumed in March alone. And we’re seeing an even bigger explosion at Google, this is from Barclays:
“We break down Alphabet’s recent disclosure around inference tokens. Key conclusions include: 1) AI Overviews are fueling a huge ramp in inference at GOOGL, 2) GOOGL is processing 5-6x more tokens than MSFT (Azure) as Search is ~6x the size of ChatGPT, which is ~2-4x the size of Gemini, 3) GOOGL is using nearly 10% of its total AI compute capex for inference tokens, the bulk (90%+) is still likely used for training new models and powering AI products like recommender systems, etc., and 4) all these inference tokens only cost GOOGL around $750m in 1Q25 using Gemini 2.5's rate card and a few assumptions, hence the rate of deleverage from infusing AI into Search appears manageable (which may come as a surprise to some investors). AI Overview costs are likely around 1% of Search revenue compared to core costs at around 18% of revenue (i.e., core costs to run organic and ads, excluding TAC costs).”
In total, token consumption at Google is up 50x over the last twelve months. One of the most straightforward ways to play this explosion in AI applications investment-wise are the big clouds—i.e. Microsoft Azure, Amazon Web Services and Google Cloud Platform. We’ve written a number of times before how enterprises architect workloads in these clouds, using cloud-specific code (SDKs), API calls, and command lines, to make use of the breadth of services on these cloud platforms ranging from AI to databases, data analytics systems, cloud functions, load balancers, secure networking, etc. These are businesses with high switching costs and with high barriers to entry, resulting in Microsoft and Amazon controlling a large part of this market.
In our view, neo-clouds have grown tremendously over the last few years due to the large shortages in GPU supply. What happened is that these clouds are fully committed to the Nvidia ecosystem, and thus Nvidia made sure that these clouds had plenty of GPU supply. Thus, the big clouds’ focus on developing ASICs to run their cloud workloads came back to bite them over the last few years as they didn’t get the GPU allocations they were looking for. It’s well known that enterprises have been struggling to get the GPU access they need in their main clouds, which is why Microsoft decided to interconnect their data centers with CoreWeave. This is a director at Microsoft commenting on this deal:
“CoreWeave is probably the best of the new AI hyperscalers, and they make it really easy to spin up and spin down resources. They are 100% dedicated to AI and one of the big reasons we have the partnership with CoreWeave, is because they had just so many NVIDIA chips, as well as the strong infrastructure. They know how to move data around incredibly well and then to do training and inferencing on those chips. CoreWeave is one of those companies that has focused on their NVIDIA relationship, and that’s really benefited them in terms of their procurement strategy. So, what we've done with CoreWeave is to interconnect our network directly to theirs through optical fiber. And then the customer can bring their data and models into Azure, but it is actually being spun out into the CoreWeave network and their GPU clusters.”
Thus, CoreWeave is clearly a legit company. Basically their data centers are running on the state-of-the-art Nvidia ecosystem resulting in tremendous performance. Where we are cautious is their long term competitive positioning. We basically see them as a play on GPUs remaining in short supply. Should the GPU market become more balanced 2-3 years from now, the main clouds will have the GPU capacity they want and then there’s no need to outsource workloads to external clouds providers. Data gravity effects in the cloud are strong—transferring high amounts of data around is both costly (egress fees) and time-consuming, which is the reason you’d typically want to centralize your data.
That the neo-clouds are struggling to attract enterprise workloads can also be witnessed in their ‘24 disclosure—basically Microsoft brought in 62% of their revenues, illustrating how enterprises are simply using Azure’s APIs and are not even aware that in the back Microsoft is offloading these workloads onto CoreWeave. Thus, we take the view that the neo-clouds don’t have a particularly strong competitive positioning in the long term. The amount of services on their cloud platforms is very limited and doesn’t come close to what you need to deploy an app, let alone run an enterprise’s IT infrastructure. Basically, their current main use is to handle AI workloads as the main clouds don’t have sufficient GPU capacity.
We don’t see having better access to Nvidia’s GPUs as a structural competitive advantage. It is more likely that at some stage, supply will catch up with demand. At the same time, competing AI stacks are becoming viable solutions. Google has successfully developed the TPU - XLA - JAX stack, built from the company’s custom accelerator, the XLA computing platform as the answer to CUDA, and JAX as the answer to the high level Pytorch framework. Also, AMD is becoming an option for inference, with Pytorch being able to compile to AMD GPUs via the ROCm computing platform. Nvidia has an incredibly strong ecosystem in AI, but we don’t see having a preferential relationship with Nvidia as a particularly strong investment case for the neo-clouds over the long term. We’d rather invest in companies that have great assets themselves, as opposed to a great relationship, which may or may not last.
Another potential weakness in their business model is that according to this analysis from Deutsche, most of the NPV for CoreWeave only comes in years 5 and 6 of renting out GPUs:
"This past week, CoreWeave hosted an update call on its financing strategy with sell- side analysts. In this note, we review the illustrative contract example laid out by the company this past week and also revisit the approach we took to estimating unit economics in our initiation report. At a high level, our conclusion from both are consistent in that the company at least fully recoups its investment within an initial 4-year contract period (including cost of capital) and that a majority of potential NPV attributable to equity holders based on the current financing mix is captured in years 5-6 of GPUs useful life."
Two remarks here. Firstly, GPUs remain in short supply so price normalization is likely once the market becomes more balanced. This will put pressure on rental pricing over time and worsen the economics. Secondly, GPUs can start failing after three years already and there’s also an incentive to move to new hardware given the massive performance leaps in GPUs. This is a manager at Amazon Web Services via Tegus:
“NVIDIA utilization rates are very high. Even if you have money, small enterprise, you may not be able to get a hold on the NVIDIA instances, either at AWS or any other hyperscaler. It's still very hard to get a hold on. A traditional server's useful life now is about five, six years, accounting and technical useful life. For NVIDIA A100, we see it's probably only three years from deployment because it's too costly to maintain them after three years. They are occupying a very precious spot in the data center. We rather swap them out with Blackwell. A lot of A100s are being swapped out soon, and now, if we have more Blackwells coming in. That's the typical useful life for AI chips we see at this moment. It's about three years.”
It’s really important that CoreWeave’s economics work out as the current financials are jaw-dropping—a $77 billion market cap for a company that generated $2.7 billion in revenues over the last twelve months, managed to burn $7.6 billion of cash over the same time period (largely in capex), while being $12 billion in debt.
Overall, from a cloud engineering standpoint, we don’t see the neo-clouds as having a particularly strong position due to the lack of breadth of services on their platform. Instead, we see GPU shortages at the large clouds as the main reason for why these neo-clouds have been able to pick up strong revenue growth. In addition, from a business model standpoint, we’re not sure whether their economics will work out long term. In our view, neo-clouds will always have to discount GPU pricing heavily in order to attract AI workloads away from the big clouds. For long term investors, we see the two dominant big clouds as much more attractive given their strong market positioning, scale advantages, customer stickiness due to data gravity and high switching costs in the cloud, and healthy economics.
Next, for premium subscribers, we’ll dive into a smaller semi name that is winning orders from the large players in AI accelerators.