The Wrong Denominator

The market has been measuring AI in the wrong units. What the shift from tokens to cost per completed task means for budgets, valuations, and your operating model.

THE CLAIM

Token consumption is the metric that repriced the semiconductor complex and underpins a near-trillion-dollar IPO runway. It measures revenue to the labs, not value to the buyer. Measured per completed task, the cost of intelligence is collapsing. Both facts are true at once, and the gap between them is where the next 18 months of corporate strategy and market repricing will play out.

Apollo co-president John Zito, June 10: tokenmaxxing is “a lot of BS, honestly… prices are collapsing per unit of IQ.” Goldman’s desk landed on the same frame five days earlier: useful task completion per watt and per dollar. The cheapest GPT-4-level model fell from $37.50 to $0.175 per million tokens in 23 months (Exhibit 2).
The bills are exploding anyway. Average business token spend is up about 13x since January 2025 (Ramp), and a typical agent job burns about 96,000 tokens (SemiAnalysis). Flat-rate pricing ended June 1, when GitHub moved every Copilot plan to usage billing. Anthropic filed its confidential S-1 the same morning. The spending index that doubled from December printed 1.75 on June 11, 12% below its May 20 peak of 1.99.
Nobody is cutting AI. In UBS’s enterprise checks, about 60% call token costs a real issue, and effectively none are retreating. They cut around AI instead: other IT spend, cloud, and headcount growth (Exhibit 3).
We did the math ourselves this issue: big-5 hyperscaler cash capex hit $412 billion in 2025, guided to $660-805 billion for 2026 (Exhibit 4), and the IT-investment categories carried 85% of Q1 2026 GDP growth (Exhibit 5). The one-trade economy is in the public data, not just the sell-side decks.
For operators: govern cost per completed task, not adoption. Token budgets now work the way headcount budgets always have: capped, forecast, and fought over.

01 · WHAT HAPPENED

Seventy days that repriced AI

For two years the AI trade ran on a simple proxy: more tokens means more intelligence consumed, which means more value created. That proxy broke this spring, because agents changed the unit economics of usage. An agent does not chat; it loops, plans, retries, and reviews, and a typical agent job burns about 96,000 tokens (SemiAnalysis). Providers responded by ending the subsidy. OpenAI moved Codex to usage billing on April 2, Google followed with Gemini on May 19, and on June 1 GitHub converted every Copilot plan to metered “AI credits,” with unlimited use surviving only for code completions. Sam Altman’s own summary this week: token costs went from a non-issue in January to “a huge issue” and a meme. In seventy days the AI cost line went from rounding error to board topic (Exhibit 1).

Companies reacted fast, and unevenly. Amazon, which had run an internal leaderboard (KiroRank) that rewarded engineers for token consumption, removed it after busywork flooded through agents to climb the rankings. Uber’s roughly 5,000 engineers burned the company’s full-year 2026 budget for agentic coding tools in four months, with individual bills reported between $150 and $2,000 a month; the company now caps spend at $1,500 per engineer per month, per tool, with an exception process (Bloomberg, June 2). Walmart capped its internal AI agent after demand ran too hot (Bloomberg, June 1). Microsoft cancelled most of its Claude Code subscriptions, partly on cost (The Verge). Axios reported one enterprise spending $500 million on Claude in a single month with no usage limits; that figure is single-sourced and unconfirmed by Anthropic, and we weight it accordingly. The Wall Street Journal’s summary: “the free-money period for AI is definitively over.”

The market’s own price gauge moved the same way. The Silicon Data LLM Token Expenditure Index, a usage-weighted measure of what the market actually pays per million tokens, roughly doubled from its December base to an all-time high of 1.99 on May 20, then rolled over: Citadel’s June 10 note counts six straight down days through June 9, the longest streak since January, and the index’s public page printed 1.75 on June 11, about 12% below the peak. The daily series itself is licensed, so we will not chart it; Citadel’s published note carries the full chart. The print is ambiguous, and the ambiguity is the story: a falling expenditure index can mean demand rolling over, or it can mean deflation arriving in the spend line, the same work bought cheaper. Citadel’s June 10 desk note leans to the second, reading the decline as a shift toward cheaper models and adoption becoming “less about what frontier models can do in principle and more about the price and scarcity of the inputs required to make AI operational at scale.” Note the messenger: Citadel is one of the two firms Zito names as gladly paying anything for the frontier, and it is their desk describing everyone else trading down.

02 · THE REFRAME

The measurement error: tokens price throughput, not intelligence

Some of the bluntest pushback is coming from AI’s own backers. John Zito, co-president of Apollo, at the Morgan Stanley US Financials Conference on June 10: “I think tokenmaxxing and token talk is - it’s a lot of BS, honestly. Like, if you look at per unit of knowledge and cost per unit of knowledge, prices are collapsing. Prices are collapsing per unit of IQ, if you did it that way.” The point is more than rhetorical. A token is a unit of throughput, and nobody buys throughput. His diagnosis of why the bills explode anyway: firms point frontier models at tasks that do not justify the compute. “Our IQs are so low that we’re actually using [AI tools] to check out the recipe for, you know, French toast.” Goldman partner Rich Privorotsky had landed on the correct metric five days earlier: useful task completion per watt and per dollar. When two institutions of that weight converge on one framework inside a week, a consensus is forming in real time.

The data supports the denominator (Exhibit 2). In Epoch AI’s price data, the cheapest model clearing GPT-4-level general knowledge fell from $37.50 to $0.175 per million tokens in 23 months; GPT-3.5-level fell from $20.00 to $0.07, the Stanford AI Index’s 280x collapse in 18 months; GPT-4-level coding fell about 100x in 16 months. Across capability thresholds Epoch measures declines of 9x to 900x per year, median 50x, with the fastest declines all starting after January 2024. Open-weight and Chinese models now deliver near-frontier capability at a tenth to a twenty-fifth of frontier prices (Citrini), and Cursor’s latest model is reported to match frontier coding performance at roughly a tenth of the cost per task. Priced per unit of capability, the deflation has no modern precedent.

The CFOs are right about the numerator, though. Gartner forecasts that inference on a trillion-parameter model will cost over 90% less by 2030 than in 2025, and still expects enterprise bills to rise, because agentic workloads burn 5 to 30 times the tokens of a chatbot exchange and consumption grows faster than prices fall. Its analyst Will Sommer’s warning: “Chief Product Officers should not confuse the deflation of commodity tokens with the democratization of frontier reasoning.” A unit cost down 90% times a unit count up thirtyfold is still a bigger bill. The resolution of the Zito-CFO standoff: token expenditure is a useless productivity metric and a decisive revenue metric. It says nothing about value created. It says a great deal about whether the revenue under roughly $3 trillion of AI-linked infrastructure investment is durable. That is why the index rollover matters even if Zito is right about everything else.

03 · THE CORPORATE RESPONSE

Rationing without retreat: the capital swap

The most counterintuitive finding in the UBS enterprise checks is what enterprises are not doing. About 60% of the IT executives polled call token costs a real issue (“this is not a made-up media story,” in the bank’s words). One described the Copilot pricing change in a single word: “chaos.” Another opened the first real AI bill and heard leadership say “we don’t have the money for this.” Yet not one check showed a company slamming the brakes. The dominant behavior is guardrails: caps, alerts, model downshifting, pooled token budgets. Several explicitly refused to throttle usage and are cannibalizing other line items to make room, starting with external IT services, cloud spend, and, most notably, headcount growth. One sourcing caveat we owe you: the checks reached us through coverage of the Zito interview, and we could not locate a standalone UBS report, so the survey numbers carry that flag. The pattern they describe is corroborated by every public channel we track.

The substitution shows up wherever someone asks (Exhibit 3). Gartner’s CFO survey has expected headcount growth collapsing from 6% to 2% for 2026 while 75% of CFOs raise technology budgets, 48% by double digits, and about 60% plan double-digit increases in AI investment specifically. The Atlanta Fed’s business survey has executives expecting +2.25% productivity from AI over three years against a 1.2% headcount reduction, with firm AI spend per employee jumping from $1,358 in 2025 to $2,068 this year, roughly $280 billion in aggregate (the Fed’s flagged ballpark). Walmart froze its 2.1 million-person workforce for three years while revenue grows. The swap runs through hiring plans rather than pink slips, it compounds, and flat headcount plus AI is becoming the default planning template. We call it the capital swap.

Structurally, software is turning from a fixed cost into a variable cost. Exponential View’s survey work found over 70% of companies blew through their 2025 AI budgets. SemiAnalysis’s Doug O’Laughlin argues “automated intelligence” is simply a permanent new opex category, and that firms will swing between overspend and underspend until each finds its own labor-to-compute ratio. Boards will start asking for that ratio, not an adoption percentage, by this time next year.

04 · THE MACRO STAKES

A one-trade economy, and this time we did the math

The denominator debate would be academic if the numbers were small. They are not, and for this issue we computed them from the filings rather than quoting a bank deck. Cash capex at the big five hyperscalers ran $158 billion in 2022, $154 billion in 2023, $239 billion in 2024, and $412 billion in 2025, nearly tripling in two years; 2026 company guidance sums to about $660 billion, and Morgan Stanley’s lease-inclusive estimate is $805 billion (Exhibit 4). Amazon alone plans roughly $200 billion and sold C$14 billion of Canadian-dollar bonds on June 8 to help fund it. Apollo’s chief economist Torsten Slok calculates hyperscalers now devote about 60% of operating cash flow to capex and that the top ten S&P 500 names are more richly valued than at the 1999 peak. The bubble thesis and the deflation thesis now coexist inside the same firm, which tells you the debate is real.

Then we tested the strongest macro claim on the tape (Exhibit 5). Morgan Stanley estimates AI-related investment drove about 75% of Q1 2026 GDP growth. The national accounts cannot isolate “AI,” but they publish the categories where AI equipment and software live: information processing equipment contributed 0.87 points and software 0.49 points of the quarter’s 1.6% annualized growth, 85% of it between them. Against their 2017 to 2024 average of 0.31 points, the surge above trend is about 1.05 points, two thirds of all growth; strip the surge out and the economy grew roughly 0.5%. So the 75% is not hype. It is roughly what the accounts allow, provided nearly all of the IT surge is AI. The honest caveats run both directions: the measure is gross of imports (equipment contributed +0.89 points in Q1 2025 while GDP shrank 0.6%, the tariff quarter), it misses data centers and power, and it includes non-AI IT spend. Goldman’s import adjustment and UBS’s shallow-adoption objection live in exactly those gaps. Our calculation bounds the claim; it does not settle it.

05 · SPILLOVER

The narrative is doing the firing

Meanwhile the cost panic is leaking into the labor market through the narrative channel, ahead of the capability channel. Challenger’s May report makes AI the No. 1 stated reason for US job cuts for the third consecutive month: 38,579 cuts citing AI, 40% of the month’s total, up from 7% in January, 25% in March, and 26% in April (Exhibit 6). Year-to-date AI-attributed cuts, 87,714, already run 1.6 times all of 2025, and May’s 97,006 total was the highest for the month since 2020. This is employer self-attribution, though, and Challenger itself flags that AI may be a socially acceptable cover for ordinary cost cuts. HBR’s Davenport and Srinivasan document that the wave is anticipatory: firms are cutting ahead of demonstrated AI capability because investors reward the story. One CEO put it more sharply: some companies may be citing AI to cover cuts made to pay for AI.

Recall the floor argument from our first briefing: substitution stops where the marginal cost of compute exceeds the marginal cost of labor for the task. Tokenpanic made that floor visible, faster than we expected. When an agent job costs real money, “automate everything” becomes a budget line someone must defend. The displacement question itself is genuinely contested in the data; the NY Fed now attributes most of the rise in new-graduate unemployment to remote work, not AI, and that fight is the subject of our next issue. What is not contested: the story of AI is restructuring organizations faster than the technology is.

06 · WHERE THE DOLLARS POOL

Good enough will do, and the IPO tension

If the denominator wins, if buyers optimize cost per completed task, the demand curve does not shrink. It migrates. Simple work routes to local and open models. Hard tasks go to the cloud. The frontier is reserved for users whose ROI is provable; Zito names Citadel and Jane Street as gladly paying anything for it, and Citadel’s own note expects frontier-intensive AI to concentrate in “a narrower set of firms with the balance sheets to absorb the compute cost.” Citrini’s rotation map points the same way: value moves to smart routing, observability, edge inference, and good-enough models. Microsoft used Build this week to pitch MAI models it says burn 60% fewer tokens on coding. Volume keeps growing while the dollars and margins pool somewhere other than where today’s valuations assume.

That makes the IPO timing awkward. Anthropic filed its confidential S-1 on June 1 at a reported $47 billion run-rate, up from about $10 billion a year earlier, following a Series H at a $965 billion post-money valuation; press reports put the IPO ambition near $1.8 trillion. The filing landed the same morning flat-rate pricing ended, and a day before its largest customers began rationing its product. OpenAI’s single largest customer now burns 100 billion tokens a month, per Altman. The labs are monetizing token volume at the exact moment their customers are learning to minimize it. Underwriters will call that growth. Buyers should ask how much of it survives the routing layer.

WHAT WE’D DO · THE OPERATOR’S PLAYBOOK

1. Re-denominate your metrics. Instrument cost per completed task for your top five AI workflows. Retire “AI usage” KPIs; Amazon’s leaderboard bought token burn and busywork, not output.
2. Write a routing policy. Default to the cheapest model that clears your quality bar and escalate to frontier models only on documented ROI. A three-tier ladder (local, mid, frontier) captures most of the deflation Zito is describing.
3. Govern tokens like headcount. Caps, alerts, and dashboards (Uber’s $1,500-a-month template), paired with workflow redesign so the caps don’t freeze adoption at its least productive configuration.
4. Re-underwrite vendor contracts. Anything signed under flat-rate assumptions needs stress-testing at 2x to 3x usage growth. Annual Copilot plans roll onto metered billing as they renew; know your renewal dates before the meter does.
5. Set your labor-to-compute ratio deliberately. The capital swap is happening either way (Exhibit 3). Decide what share of next year’s capacity growth is hired versus computed, and own the trade-off explicitly.

WATCHLIST · WHAT WOULD CHANGE OUR MIND

Silicon Data token index: already 12% off its May 20 peak through June 11. Whether the slide is deflation in the spend line or demand destruction is the cycle’s central question; a sustained break lower would force the answer.
Challenger June report (early July): does AI attribution hold above 40% of cuts, or does the narrative cool and the un-attributing begin?
Q2 earnings (late July): hyperscaler capex guidance against softening token expenditure. The first quarter where the two series can visibly diverge.
BEA Q2 advance estimate (July 30): our Exhibit 5 bound updates mechanically. If the IT-investment contribution holds near a point while ex-IT growth stays near zero, the one-trade economy thesis hardens.
Anthropic roadshow: how much revenue is committed spend versus meterable usage that customers are actively learning to reduce. A sustained index break plus a guidance cut would flip our read from deflation to demand destruction, and we would say so.

NEXT ISSUE · THE ENTRY-LEVEL PUZZLE

Stanford’s payroll data says AI is displacing workers aged 22 to 25 in exposed occupations, a 16% relative employment decline. The NY Fed says 64% of the rise in graduate unemployment comes from remote work, not AI. Yale says the occupational mix isn’t changing at all. Someone is measuring wrong. Next week we show you who, and what it means for your hiring pipeline.

SELECTED SOURCES

Bloomberg interview with John Zito, Morgan Stanley US Financials Conference (Jun 10, 2026) · Citadel Securities, “Tokenomics,” F. Flight (Jun 10, 2026) · Citrini Research, “State of the Themes” (Jun 8, 2026) · D. Thompson with D. O’Laughlin (May 29, 2026) · Axios, “AI sticker shock” (May 28, 2026) · WSJ, “Corporate America Is Starting to Ration AI” (May 2026) · Bloomberg on Uber caps (Jun 2) and Walmart’s agent (Jun 1) · Business Insider on Amazon’s leaderboard (May 2026) · The Verge on Microsoft and Claude Code · GitHub changelog (Jun 1, 2026) · UBS enterprise checks via press coverage (Jun 2026; see caveat in 03) · Silicon Data Token Market Pulse · Epoch AI / Artificial Analysis price data (CC-BY) · Stanford AI Index 2025 · Gartner press releases (Oct 15, 2025; Feb 10 and Mar 25, 2026) · Atlanta Fed macroblog (May 6, 2026) · Challenger, Gray & Christmas (Jun 4, 2026) · Apollo 2026 Outlook (T. Slok) · Morgan Stanley estimates via press reports (May 2026) · SEC EDGAR XBRL company filings · BEA via FRED (May 28, 2026 vintage) · Fortune on the Anthropic S-1 (Jun 1, 2026) · BIS WP 1179 · G. Gopinath, Odd Lots (May 29, 2026) · HBR, Davenport & Srinivasan (Jan 2026).