The Token Explosion
Inference is just getting started
Over the last few quarters it has become clear that AI consumption is exploding at the hyperscalers. For example, this is Sundar on Google’s recent call in July:
“At IO in May, we announced that we processed 480 trillion monthly tokens across our surfaces. Since then, we have doubled that number, now processing over 980 trillion monthly tokens, a remarkable increase. The Gemini app now has more than 450 million monthly active users, and we continue to see strong growth in engagement with daily requests growing over 50% from Q1.”
And we got even stronger growth rates at Microsoft:
“Now on to AI platform and tools. Foundry is the agent and AI app factory. It's now used by developers at over 70,000 enterprises and digital natives to design, customize and manage their AI apps and agents. We processed over 100 trillion tokens this quarter, up 5x year-over-year, including a record 50 trillion tokens last month alone. And 4 months in, over 10,000 organizations have used our new agent service to build, deploy and scale their agents.”
Token consumption will continue to grow at very steep rates for the foreseeable future as only now AI agents are really being introduced, and these are token-intensive. At the same time, AI is being applied to solve ever more problems in a wide variety of software. Simple examples are AI tracking how many calories you consume in a day (CalAI), with more complex examples being novel applications such as robotaxis and humanoid robotics.
As the major clouds can’t keep up with this demand, this led in recent weeks to huge contracts with some of the secondary clouds such as Nebius and Oracle. Microsoft signed a $19.4 billion five-year deal with Nebius, and this was dwarfed a few days later by the $300 billion five-year deal between OpenAI and Oracle. This is Oracle’s CEO on the recent call:
“We have signed significant cloud contracts with the who's who of AI, including OpenAI, xAI, Meta, NVIDIA, AMD and many others. At the end of Q1 remaining performance obligations (RPO) are now $455 billion. This is up $317 billion from the end of Q4. I expect we will sign additional multibillion-dollar customers and that RPO will likely grow to exceed $0.5 trillion. The enormity of this RPO growth enables us to make a large upward revision to the cloud infrastructure portion of our financial plan. We now expect Oracle Cloud Infrastructure will grow 77% to $18 billion this fiscal year and then increase to $32 billion, $73 billion, $114 billion and $144 billion over the following 4 years.”
The risk here is obviously that OpenAI’s growth would disappoint long term as OpenAI’s revenue for this year will be around $13 billion. In order to pay $60 billion per year to Oracle as from 2027, it will need very strong revenue growth. Softbank is part of this deal as well, but this company is drowning in debt (9.9x Net Debt / EBITDA), so Softbank isn’t the most reliable ally.
Both Nebius and CoreWeave suffer another type of customer risk – most of their revenues will come from Microsoft, and it’s not sure whether Microsoft will renew these contracts next time around. If five years from now, the leading cloud has been able to build out sufficient data center scale on its own, obviously there is no need anymore to outsource this capacity to Nebius and CoreWeave. At some stage, Microsoft will be looking to pocket this margin themselves. After all, it’s Microsoft who owns the end customers. Large and small enterprises connect to the Azure cloud via APIs and these customers are not even aware that their GPU workloads are being offloaded to Nebius and CoreWeave data centers in the back. As AI demand is currently exploding, contracts like these are a nice way for Microsoft to swiftly increase capacity.
Oracle’s founder Larry Ellison provided some color on what’s going on behind the scenes:
“People are running out of inferencing capacity. The company that called us said ‘we'll take all the capacity you have that's currently not being used anywhere in the world. We don't care.’ And I've never gotten a call like that. That's a very unusual call. That was for inferencing, not training. There is a huge amount of demand for inferencing. And if you think about it, in the end, all this money we're spending on training is going to have to be translated into products that are sold, which is all inferencing.”
Overall, these contracts are obviously a big winner for these secondary clouds in the coming years. However, long term investors looking for solid compounder-type stocks will be much better positioned in the leading clouds such as Microsoft Azure and Amazon AWS. We see the secondary clouds as being more of interest to shorter term investors who want beta to play AI momentum. Clearly, if AI demand continues to outstrip supply, secondary clouds will see the strongest momentum. However, long term, both the sustainability of these contracts and GPU pricing pose big risk factors for the neo-clouds. Major clouds have a rich ecosystem of services available on their clouds, which allows you to deploy all your workloads and apps, manage your data etc. In neo-clouds, basically, you can rent GPUs.
Tencent is just one example where this boom in token consumption is coming from. LLMs are being used to produce visual content, write code, improve ad and content targeting, etc. Goldman recently caught up with the company’s CSO:
“The company sees deployment of AI driving sizable content production efficiency gain (e.g. deploy virtual outfits production to LLM to drive higher revenue and profitability) and AI enhanced virtual NPC to improve experience. AI is strengthening Tencent’s evergreen games portfolio, e.g. with more seamless updates, e.g. Tencent recently upgraded Valorant to Unreal Engine 5 based seamlessly. Mini Games revenue growth sustained above 20% yoy over the past few years.
Aggressively deploying AI in advertising amid Tencent’s 1/3 of China Internet time spent share and relatively under-monetized advertising business at 1/8 revenue share of industry, through leveraging vertical models in analyzing user behaviors and on user orders/purchase intent, facilitated by its growing eCommerce business with Mini Shops’ SKU-level data. AI‐led adtech upgrades could sustain above‐industry ad growth through improving click-through rates and conversion across Tencent’s multiple ad inventory.
Leading AI model benchmark performances, focusing on multi-modal opportunities where the company has earlier gone through a re-architecture of its AI model strategy, with Hunyuan models now reaching state-of-the-art (SOTA) in China, narrowing the gap with global SOTA models. The company just launched its HunyuanWorld-Voyager 3D model, while multi-modal Hunyuan 3D model has reached leading rankings on Hugging Face. The company sees potential to further integrate AI into the broader Weixin ecosystem (such as Mini Shops), QQ browser, with native AI applications driving direct consumer interaction.
We come out of the session with continued confidence in Tencent, with the company working/investing to further deepen its moat, while being the key AI application beneficiary in China Internet, given AI empowers long runways of growth across all of its major business lines. We also see upcoming potential agentic AI functionalities for its Weixin super-app, growing closed-loop eCommerce transaction capabilities, and Tencent Cloud’s growth potential as a Top 3 China public cloud player by scale. We forecast 13%/13% revenue growth yoy and 18%/17% EPS growth for 3Q25E/2025E.”
One of the big debates in AI currently is the future market split between the merchant GPUs of Nvidia, and to a lesser extent AMD, versus the Application Specific ICs (ASICs) which the major clouds and leading AI players are looking to deploy. Google has been at the forefront here with its AI workloads running on its custom TPU and JAX + XLA software stack, which are basically Google’s answer to the competing open-source PyTorch library and Nvidia’s CUDA computing platform. The consensus view is that XPU (custom accelerators) will take share over time. And it’s easy to see why, as we’ve already seen this scenario play out in the cloud CPU market. The image below illustrates how the cloud CPU market has been commoditizing due to intermediary software abstraction layers. Basically, in modern cloud workloads, apps get deployed into containers which are orchestrated by container software such as Kubernetes and Docker. These containers can get deployed on any underlying hardware, ranging from Intel and AMD x86-based CPUs, to Amazon’s custom Graviton ARM-based CPUs.
For premium subscribers, we’ll continue our dive into current developments in AI and the AI stack, including a number of stock picks.



