Gemini was my preferred model for research a few months ago. So, was Grok. Recently, I asked one of them for serious research, accompanied by blunt exhortations on how it has been disappointing me repeatedly of late, along with examples of the quality of answers I needed. The answers that came disappointed even the lowest expectations. The model rated its own answer nearly useless when asked to judge. Attempts to push deeper were met with deflection. When asked directly whether upgrading to a higher subscription tier would restore the analytical depth, the system said something remarkable: for the kind of deep research being attempted, it admitted that even an upgrade to a more premium plan might not help. It suggested reconsidering whether the subscription was worth keeping at all. An AI product recommending its own cancellation is not a glitch. It is a confession.
The models have genuinely improved and will continue to do so. And yet for the majority of paying users, the experience is moving in the opposite direction. The conversation among practitioners has shifted entirely. A year ago, the question was what AI could do. Today, the question is how quickly the session will end. Engineers managing serious workloads spend real time timing queries to avoid peak-hour throttling, switching platforms as limits reset, and calculating whether a task is worth consuming a week's allocation in one session. The technology has delivered an anxiety it never advertised: the slow creep of the progress bar reaching its ceiling.
OpenAI’s Sora, perhaps the most appreciated video generator, is shut down within months of public launch because of the high compute cost. Google's flagship AI platform cut its free tier by up to 92% overnight in late 2025, massively impacting its popular image-generation service. Doubao even had to suspend the regular video calling feature mid-holiday when server demand outran available infrastructure. These are early days, and the AI industry has begun abandoning the masses.
The throttling is not just perceived but increasingly measured and documented. Deep Research buttons are being removed explicitly by some, while the length of answers provided is continuously cut across platforms. Regular conversations have a new topic: the enforced waiting time before one can resume using AI. A forensic analysis of nearly 18,000 reasoning sessions found a 67% decline in model reasoning depth following a provider update explicitly designed to reduce compute consumption.
This was not the historical template. The BlackBerry was a corporate status symbol before it was a commodity. Email was a workplace privilege before it was a utility. Broadband was an office luxury before it reached every household. In each case, the arc was the same: early exclusivity, rapid commoditization, eventual ubiquity. Access expanded. Inequality narrowed. The AI industry made the same implicit promise at launch, with generous free tiers and broadly accessible capabilities. That promise is being broken in plain sight, and unlike those prior technologies, the constraint driving the reversal will make it far worse, with severe consequences beyond the ability to do images, videos, or deep research.
Token inequality is the result. But token inequality is not only a story of scarcity. It is equally a story of abundance, deliberately constructed. Those with resources are ostentatiously moving in the opposite direction to corner the advantages of new technologies that are no longer in doubt. While most users are discovering that their quotas shrink and their outputs thin, a separate, loudly enthusiastic cohort is being told something entirely different: burn more, spend more; the only failure is leaving tokens on the table.
The Race to Burn: Token Consumption is the New Capex
Sixty trillion tokens. One company. Thirty days. By some measure, this could be as high as 4% of estimated global daily token consumption, from 85,000 people who represent approximately 0.001% of the world's population.
Inside the largest technology firms, the directive is the inverse of everything the mass market is experiencing. The instruction is not to conserve. It is to consume as aggressively as possible, and to treat any unspent allocation as evidence of failure. Internal leaderboards rank engineers by token volume. Performance reviews incorporate AI usage metrics. Token budgets are being packaged into compensation alongside salary and equity. The vocabulary has shifted: tokenmaxxing has moved from ironic shorthand into earnest corporate strategy. At another firm, a single employee burned two hundred and eighty-one billion tokens in the same month.
Business token spend across corporate America rose thirteen times between January 2025 and early 2026. The enterprises driving that figure are not doing the same tasks as those priced out of the conversation. They are running autonomous agents across million-line codebases, spawning parallel research loops that process hundreds of sources simultaneously, and submitting pull requests that the model wrote, tested, debugged, and verified without a human in the loop. An engineer with an uncapped agentic budget is not a faster version of an engineer without one. They are operating in a categorically different professional reality.
The productivity case is real. But what is being constructed in this race is not merely efficiency. It is a permanent structural moat, assembled quietly, during a supply shock, at a pace the market cannot yet see clearly. Every autonomous workflow completed, every codebase refactored without a human bottleneck, every research synthesis produced in minutes rather than days, adds to an institutional knowledge base that no amount of talent can replicate from behind a rate limit. The gap being built is not about who has the better engineer. It is about who handed their engineer an automated factory and who handed theirs a calculator with a daily usage cap.
Token inequality does not create a spectrum. It creates a fork. In the corporate world, those with resources or money to invest/spend will be able to harness to increase their strengths materially and progressively in ways unrecognized so far.
Cause: The Lights Are Still On
Of course, the widening gap between the AI Haves and AI Have Nots has its root in the new cognition not being available to all, not just because of the much-discussed, enormous costs involved in its development, but also its availability at any price, now that the utility is no longer in doubt. Identifying the real cause is important to understand where the future might be headed.
Pick up any newspaper covering artificial intelligence infrastructure, and the story is the same: power. Gigawatt-scale data centers. Grid bottlenecks. Transformer lead times are measured in half-decades. The projected shortfall in US power availability is expected around unfathomable 50GW by 2028. The International Energy Agency is estimating that data centers to consume the equivalent of Japan's entire annual electricity output in this year. While the popular narrative has settled on energy as the binding constraint, this is largely a future issue with time to solve.
The present reality is that data centers are not yet going dark. The lights are on. The models are running. The constraint that is actually rationing intelligence today is not the power flowing into the building. It is the silicon inside it.
As we detailed in Silicon Shock and numerous other articles, despite every announced capital expenditure commitment, the physical hardware pipeline cannot be accelerated to match demand that is growing faster than semiconductor manufacturing plants can respond to. Already, GPU lead times are running 36 to 52 weeks. Advanced semiconductor packaging capacity is fully booked for multiple quarters. The high-bandwidth memory supply is tight enough that hyperscalers have locked up nearly half of the global DRAM supply through multi-year contracts, physically denying it to smaller competitors. Spot prices for the most advanced AI chips rose between 20 and 48% in the first months of 2026 alone. Power shortages are a relatable narrative. Semi’s is where there is evidence in real money terms.
The downstream consequences are visible as well. Cloud companies are raising rental rates. They are also demanding multi-year lock-in contracts from smaller customers while reserving priority allocation for their largest accounts. The CFO of one of the largest AI providers stated publicly that she spends much of her time hunting for near-term compute capacity and making difficult decisions about which projects to shelve.
When demand exceeds physical supply at any price, the market does not clear cleanly. It allocates administratively, through tiers, contracts, and preferential routing. The organizations that secured compute commitments earliest, that signed the longest contracts, that had the balance sheets to pay the premium, are the ones whose queries get answered fully, whose agents run unthrottled, whose reasoning goes unredacted. Everyone else joins the queue. Token inequality begins not at the billing page but at the allocation desk, and the allocation desk currently has a very clear view of who matters.
Demand Has Gone Vertical on The Want Alone
The internet made the small dangerous. A two-person startup with a broadband connection and a clever idea could reach a global market, outmaneuver a legacy institution, and build a billion-dollar business before the incumbent understood what had happened. Cloud computing deepened this dynamic. The barrier to infrastructure disappeared. Talent mattered more than capital. For two decades, technology systematically favored the nimble over the entrenched.
That dynamic has inverted. Completely.
A serious agentic development environment now costs an estimated $4,000 per developer per month in API tokens alone. For a well-funded enterprise engineering team, that is a rounding error on a quarterly budget. For an early-stage startup burning through a seed round, it is an existential line item that forces a choice between experimentation and survival, and this is before one begins to factor in that for an increasing number, the availability is not guaranteed against a giant willing to spend a lot more. One developer community calculated that running the kind of autonomous coding workflows now standard at large technology firms would consume a startup's entire monthly infrastructure budget in seventy-two hours. The response in many developer forums is resignation.
This is the rupture. The enterprises with compute allocations secured years in advance are running hundreds of autonomous workflows simultaneously, building vector knowledge bases, refactoring legacy codebases at speed, and accumulating institutional intelligence that compounds daily. The startup founder is calculating whether a single deep research session justifies the cost. One is building with the technology. The other is negotiating with it. Industry analysts now project an 80% failure rate for AI-native startups by the end of 2026, driven not by poor ideas or weak teams but by the arithmetic of compute costs versus venture timelines.
The historical parallel inverts painfully. When Salesforce launched as a startup in 1999, it did not need to outspend Oracle on servers to compete. When Google was built in a garage, the internet did not charge more per search for smaller organizations. The infrastructure was indifferent to scale. Token pricing is not indifferent to scale. It is structured, deliberately or otherwise, in a way that makes sustained experimentation a privilege of the capitalized. The giant can afford to fail at agentic workflows a hundred times before finding what works. The startup cannot afford to fail twice.
The implications of a rapidly rising cost of doing business are beyond the fate of individual companies. The popular history of technology innovation, one dominated by the experiences of the previous Internet era, is disproportionately a history of small actors disrupting large ones. The conditions that made that possible are precisely what token inequality is eroding. What replaces disruption from the edges is consolidation at the center, and the center is currently locking in advantages at a pace that structural challengers may find permanently unreachable.
The want argument, however pressing, is not the sharpest edge of token inequality. The sharpest edge is what comes next, when the question is no longer who gets to innovate faster, but who gets to remain secure at all.
From Want to Need: The Vulnerability Ratchet
For years, the cryptography community warned that quantum computing posed an existential threat to the encryption standards protecting global digital infrastructure. The argument was that a sufficiently powerful quantum machine could break the mathematical assumptions underlying public-key cryptography, rendering decades of security architecture vulnerable overnight. Governments funded post-quantum cryptography research. Standards bodies convened. The threat was taken seriously precisely because the capability, once it existed, would be irreversible.
That kind of threat has now arrived, through a different door, on a faster timeline, and without the decade of visible warning.
For a while, we have known that the most capable AI models will identify software vulnerabilities at a scale and depth without precedent in security research history. A model capable of ingesting a million lines of code, reasoning across interdependencies, and surfacing exploitable weaknesses in minutes could invalidate piecemeal and progressive assumptions on which global softwares were built. Suddenly, what has been a hypothetical is a clear and present danger, and likely far larger than anyone imagined.
Claude Mythos Preview, Anthropic's most advanced model, autonomously discovered and exploited zero-day vulnerabilities in every major operating system and web browser during internal testing. It found a 27-year-old bug in OpenBSD (a system built primarily around its security reputation) and a 16-year-old vulnerability in a widely deployed video codec. Where its predecessor model produced working exploits on Firefox vulnerabilities twice out of several hundred attempts, Mythos Preview succeeded 181 times. Anthropic has not released this model publicly. Instead, it launched Project Glasswing, committing 100 million dollars in usage credits to a consortium of technology companies to identify and patch vulnerabilities in critical software before the capability becomes broadly available. The launch triggered high-profile meetings between members of the Trump administration, technology executives, and bank chief executives about the security risks of powerful AI models.
The restraint is admirable. It is also temporary, and everyone involved knows it. Anthropic has stated its goal is to eventually deploy Mythos-class models at scale. Other labs are developing comparable capabilities on their own timelines. The window between a model existing and being broadly accessible is measured in months. Every financial institution, every hospital network, every critical infrastructure operator, every organization running software built over the last two decades is now on a clock they did not set.
Also, in who is getting the early access and time to reduce the risks is where one sees the manifestations of the token inequality in miniature. The world is certainly safer, overall, with the largest preparing in advance, but a penny for all in the dark.
To emphasise, Mythos will not be the only model to develop the capabilities that has caught everyone by surprise. Numerous others will get there soon, and many will be in the public domain without the same restraints. Plus, all will likely get exponentially better. All entities globally will need to begin preparing for defenses, and the defenses will require an enormous number of tokens, irrespective of their current views on AI utility, budgets, and business development plans.
The response to this threat, assuming it is not a hoax, could upstage the global landscape in ways that are impossible to forecast. Fighting the threat will require tokens going against tokens. Continuous frontier model access. Autonomous agents running persistent vulnerability scans across live codebases. Deep reasoning applied to legacy system architectures built when no one imagined this threat existed. The organizations that can afford that infrastructure are building security moats in real time. Those that cannot are accumulating exposure at exactly the rate their better-resourced competitors are reducing theirs. Unlike productivity advantages, which compound gradually, security gaps may resolve catastrophically and instantaneously. Token inequality in this dimension is not about who innovates faster. It is about who survives the next serious attack. The want has become a need, the need is token-intensive, and the supply is constrained. The inequality does not merely disadvantage. At this frontier, it threatens existence.
Conclusion: AI is Our Economics; AI is Our Politics
Every technology of consequence in the last half-century was deflationary. The internet reduced the cost of information distribution to near zero and destroyed margins across media, retail, and financial services while creating vast consumer surplus. Mobile phones made global communication nearly free and opened markets where none had existed. Cloud computing eliminated the capital expenditure of server infrastructure and allowed a two-person startup to deploy at the scale of a multinational. In each case, the technology was a force that lowered costs, distributed its benefits broadly, and made the world more competitive by lowering barriers to entry. The economist's summary was always the same: new technology increases total output, reduces price, and spreads capability.
Token inequality makes generative AI the first major technology of the modern era to operate in the exact opposite direction. Not deflationary but inflationary. Not distributing the capability but concentrating it. Every layer of the economy feels this differently, but none escapes it.
Token inequalities’ next phase will include the moats built on cornering the compute, at least until the supply shortages reduce. As we discussed above, this is not just an intertemporal issue: in the face of new security risks, those without the affordability or access could disproportionately suffer in the periods ahead.
At the national level, the inequality is starker and the consequences longer-lasting. The countries that control the silicon, the power grid headroom, and the data center infrastructure are not distributing those advantages. They are monetizing them. Already, a software developer in Cairo querying an AI system in Arabic pays more tokens for equivalent output than a developer in California asking the same question in English; the tokenization architecture penalizes non-Latin scripts by factors of two to six. A startup in Jakarta or Lagos that needs the same agentic infrastructure as its competitor in San Francisco faces a cost that represents multiples of its entire payroll.
Of course, governments have responded with sovereign compute initiatives worldwide, but, as we have seen with other festering inequalities, these initiatives are unlikely to have sufficient mitigating power. Sovereign compute buys access to yesterday's capability at the pace of government procurement cycles. The frontier moves faster than that. And, these are token inequality conclusions before one brings in the far more serious availability and security issues discussed above.
The conclusion that follows from all of this is not comfortable, and it should not be dressed as anything other than what it is. Token inequality is not a phase in the adoption curve of a new technology. It is the defining architecture of the intelligence economy. The constraints that drive it do not respond to goodwill, pricing commitments, or policy statements. Already, AI spend is altering global spend baskets, and those who do not have the Dollars to spare risk falling materially behind. If, up until now, embarking on AI initiatives was dependent on one’s intentions, whether one really believed in its benefits, now it is a function of the capabilities.




