The Second Death of SaaS

‍Silicon Shock, Token Inequality, and Techflation are no longer fringe theories floated on these pages. They have become the invisible gravity warping the global economy, rewriting policy, and keeping politicians awake at night. When we called the Silicon Shock an era-defining event in January, we braced for a massive reckoning. What we did not expect, perhaps reflecting our own naive optimism, was the market's infinite capacity for distraction.

The market, with its usual gift for looking busy near the wrong fire, has chosen to ask whether the AI trade is already priced in. It is a fine question for a screen. It is a poor question for an economy. All the impatience around when technology product prices will begin to roll over massively simplifies the intricacies of hardware businesses’ moats and the realities of rising AI capabilities. We are studying a technological earthquake the way a gambler studies a scoreboard, fixated on the ticker while the ground itself reorganizes underneath the table. There are other less glamorous and more useful issues: what happens when software, the great fixed-price machine of the last thirty years, starts burning a variable-cost fuel every time it thinks?

Three years ago, we wrote that the traditional software-as-a-service model was dying. The killer then was capability, as software began to execute the work it had merely organized, causing the old logic of renting tools to humans to crack. That was the first death, a death by disintermediation, with value sliding off the software layer and down into the hardware that did the real thinking. It was the story of an earlier era, the era when computers learned to speak human language and programming itself changed shape. Those are still genuine forces, but the SaaS model now faces new pressure from the techflation themes.

Simply said: the first SaaS pressure came from what AI could do. The second is being caused by what AI costs.

Software companies suddenly have meaningful and fast-rising variable costs. The SaaS-type pricing worked when software companies effectively had 100% gross margins.

This note about SaaS’s second death uses a much-used term (unlike when we first used it in 2023), but has a larger body count. The turmoil caused by a new cost item, the AI compute cost, is creating ripples through the application layer, as companies from the largest to those starting grapple not only with costs but also how to attract customers with the right pricing, manage the rising customer anguish, and navigate the fast-changing competitive landscape not just due to products and features but also the innovations in pricing models. The changes in the ways software is priced and sold or what it is allowed to promise will likely remain unsettled for a while. For investors in these companies, it is not just about the implications on revenues and profits in the coming years, but in some cases about who will survive and who will suddenly be waylaid on account of sudden pricing-related business decisions.

The Barrel and the Refinery: Software as Hardware’s Downstream

We may still be in the minority of one, but to us, the ongoing Silicon Shock has multiple parallels with the 1970s Oil Shock. Of course, there are massive differences, but the term is used not just to draw attention but to repeatedly assert that techflation is not just about some technology companies making more or less money, but about its impact on everything macro globally for a prolonged period.

For the purpose of this note, the useful analogy is not the oil shock itself but what it did to the companies sitting downstream of the barrel. Refiners must buy a feedstock they do not control, convert it into something finished, and discover in a crisis that they have inherited all of the volatility and almost none of the pricing power. When crude quadrupled in 1973 and lurched again in 1979, the producers were largely insulated and frequently enriched. The converters downstream took the blow.

What is easy to forget is how much of the modern apparatus of that industry was built specifically to survive those years. Before 1973, the majors ran on posted prices, long contracts, and the comfortable assumption that a barrel bought today would cost roughly the same tomorrow. The shock retired that assumption permanently. Spot markets appeared where stable contracts had been, the Rotterdam cargo trade became a price-setter, and a generation of refiners learned to live and occasionally die on the crack spread. Petrochemical makers, one rung further down, found their margins compressing every time the feedstock jumped before their own prices could follow. The defensive habits that followed - hedging desks, feedstock pass-through clauses, flexible plants that could switch inputs, take-or-pay contracts that moved the risk around - were not strategy in any ambitious sense. They were the scar tissue of an industry that had been taught it no longer owned its own cost base.

Software now sits in precisely that downstream seat, one rung below a silicon shock instead of an oil one. The token is the barrel. The model labs and the semi manufacturers are the producers, raising prices and signing capacity contracts and banking the spread with the serenity of people who own the well.

For software companies, it is far worse in some ways because of their history and the simultaneous forces of disruption caused by GenAI. The decades-long miracle of SaaS, the flat subscription, was only ever possible because the feedstock was free. One cannot overstate the shock of moving to the world of meaningful variable costs.

The Buffet Closes

For three decades, software peddled a rather exquisite miracle. The initial iteration may or may not have demanded a fortune to construct, yet every subsequent replica cost precisely nothing. That singular property of near-zero marginal cost engineered the entire illusion of the all-inclusive buffet. Vendors could comfortably offer unlimited access for a fixed fee and retain nearly every dollar, simply because refills were effectively free.

Artificial intelligence irrevocably shattered that delicate arrangement by injecting a relentless metabolism into the code. Now, every computational response incinerates metered compute, and the foundational feedstock lurches unpredictably. The all-you-can-eat paradigm simply collapses the moment the kitchen faces real material costs for every plate served, especially when those costs are dictated by upstream monopolies. Software no longer leases a quiet place to sit; it vends the raw electricity required to think.

The entities positioned closest to the consumer are naturally the first to falter under this new physics, loudly discovering they cannot pass the shock through cleanly. Consider the developer revolt in real time when GitHub transitioned every Copilot plan to metered, token-based billing at the start of June, abruptly retiring the beloved premium-request model. Users who treated the tool as a silent utility suddenly reported incinerating eight percent of a monthly allowance in two hours, while others watched a single query rack up a six-dollar toll. Microsoft subsequently retreated, pulling premium models from lower tiers and freezing signups to survey the wreckage.

What died was not a price, but a massive subsidy. The twenty-dollar unlimited plan was merely venture-funded customer acquisition, currently being clawed back with the exact ruthless efficiency of an Uber surge or a Netflix hike.

When the most aggressive Chinese discounters in the market begin stripping away the unlimited subscriptions once deployed as irresistible bait, the era of the endless buffet has conclusively ended. A software seat historically assumed the hundredth click cost the vendor precisely what the first click did. When an autonomous agent executes a thousand actions before morning coffee cools, that foundational assumption completely shatters. This structural collapse forces an entirely new vocabulary upon the industry, because the unit of billing inevitably colonizes the unit of thought. The sector organized itself exclusively around human attention, counting seats, calculating logins, and optimizing retention. The token measures something else entirely, tracking cognition in discrete fragments akin to a municipal power grid dispensing kilowatt-hours. Submitting an invoice measured in tokens acts as a quiet, definitive confession regarding the fragility of an underlying cost structure. The industry is not converging on a civilized replacement for the subscription; it is actively fracturing into a dizzying portfolio of commercial primitives, desperately searching for a new grammar to obscure the spinning meter.

The Grammar of the Meter

The reason for this new variable cost is simple enough, which is always a warning that the consequences will be complicated. Most AI features do not run inside the customer’s own machine, at least not yet in any serious way. They run on centralized, shared, expensive hardware owned or rented by someone else, and for now the software company is the one standing between the user’s appetite and the upstream bill. The app may look like software. The cost base behaves like infrastructure. This is an awkward costume change, and the stitching is visible.

The clearest measure of how fast this is reordering the industry is not in any margin table. It is in the language. A business that rewrites its vocabulary every few months is a business that has not found its footing, and software has spent the past year minting units at a pace that would embarrass a venture deck. The seat is being retired, and in its place has arrived a small menagerie, each unit carrying a different theory of who should absorb the uncertainty.

The seat counted a human. The token counts machine cognition in small fragments, closer to a power meter than a membership card. A token is a tiny unit of thought with an invoice attached. And with that as the unit, the entire moral economy of software changes.

This deliberate lack of interchangeability transforms the pricing landscape into a labyrinth, and the credit reigns supreme as the preferred instrument of structural obfuscation. Stripped of the polished marketing veneer, the cleverness of this pivot begins to feel slightly embarrassing. A credit functions exactly like an airline mile, operating as a private scrip whose underlying exchange rate is controlled entirely by the issuer and routinely devalued in the dark. It naturally sounds substantially cleaner than a raw compute token during a polite sales pitch, right up until the realization dawns that it is simply fiat currency minted by a vendor, holding value exclusively within their own walled kingdom. Salesforce currently meters Agentforce in Flex Credits, Microsoft restricts Copilot Studio to Copilot Credits, Adobe burns through Generative Credits, while SAP issues its own proprietary AI Units.

The architecture of the credit brilliantly hides the true underlying compute cost while simultaneously manufacturing immense switching friction for the trapped user. Crucially, it allows an enterprise to execute a vicious price hike merely by altering how many credits a routine task consumes, raising the real financial burden without ever laying a finger on the published sticker rate. For all practical purposes, this is aggressive monetary policy conducted by a software company, complete with a regime of silent, unannounced inflation that the captive customer is entirely uninvited to audit.

Token pricing, with or without credits, protects the vendor because the user pays for every ounce of compute consumed, including wasted compute. Action pricing says the machine did something measurable, so someone should pay. Outcome pricing says the machine achieved something useful, so payment should depend on success. That final model is the most elegant and the most dangerous for the vendor, because it turns software into labor with performance risk.

The industry is therefore not moving from one old model to one new model. It is splitting into dialects. Customer support wants resolutions and cost control. Sales wants qualified leads. Developer tools want tokens, requests, or model usage. Enterprise agents want actions. Creative tools want credits. Data platforms want spend caps. Model providers want cached-token discounts, batch discounts, and faster lanes at higher rates. And, the management is additionally struggling with new cost items. The same word, “AI,” now contains a dozen cost structures wearing one overworked suit.

There is a reason the new grammar feels unstable. No one yet knows the right unit of AI work. A prompt is too crude. A token is too technical. A seat is too human. A task is too vague. A resolution is too hard to verify. An agent-hour sounds satisfyingly industrial but may reward slow machines. A credit hides the complexity and then quietly inherits all of it. The market is trying to price cognition before it has agreed what cognition is, which is bold, but then so was selling unlimited intelligence for twenty dollars.

The direction, however, is clear. The old software invoice measured access. The new invoice measures exertion. Once that change is accepted, the rest follows with the dull inevitability of plumbing.

The Price Sheet Will Not Sit Still

For most of the history of enterprise software, the commercial department was the quietest room in the building. A pricing committee might convene once every two years to weigh a five percent increase on a premium tier, usually to justify a feature nobody had requested, and then return to its slumber. Price was the most stable thing a software company owned. It had the rhythm of a ground lease.

That room no longer sleeps. The old price sheet was a document. The new one is a confession, revised by committee and engineering and finance and legal and whoever in infrastructure has just seen the cloud bill. Across the five hundred largest names in software, one tracker counted more than 1,800 pricing and packaging changes in 2025, against 339 the year before across a sample seven times larger. The leaders are now revising how they charge several times a day between them, and the revisions are not the genteel kind. Lovable, the vibe-coding firm, rewrote its pricing roughly once a month through the year, launching a team plan in the spring and retiring it by summer. Replit offers the fuller education. It began with a flat quarter of a dollar per checkpoint, moved to effort-based pricing in the summer with a rollout it later apologized for, then shipped a more autonomous agent in September that quietly turned a 200-dollar monthly habit into something users reported running past 1,000 dollars in a week, then sunset its team plan entirely the following February.

The strange part is that the arrows point in opposite directions at once. For example, Google cut its premium tier from 250 to 200 dollars in May, then halved its entry plan to 4.99 a month in June while doubling the storage attached to it. At the top of the consumer market some stickers are falling, and read quickly that looks like ordinary deflation, the soothing kind that lets an investor picture a clean productivity wave passing through the system. Read slowly it is less comforting, because the same vendor cutting the headline is often complicating the meter underneath it in the same season, trading a simple monthly promise for rolling windows, weekly caps, and quota arithmetic that needed a small constitutional reform within weeks of arriving. A genuine price cut does not usually require an apology and a patch.

The enterprise side is more unsettled still, because there the vendors are abandoning the seat itself. One now provisions developers with no fixed monthly fee at all, which is close to heresy in the church of subscription software. Another retired its old request model after conceding that a quick question and a multi-hour autonomous session had been priced too alike. A third bills a single task as a composite of the model it used, the context it retrieved, the tools it called, and the seconds it ran. The customer is no longer buying software so much as a small live auction among models, wrapped in a familiar interface.

The deepest tell is that the largest vendors have stopped trying to choose at all. Salesforce sells the same agent through several incompatible costumes at once, and the revealing part is not that one of them is wrong. It is that all of them are plausible, which means none has won. The revenue arrived regardless, past half a billion dollars and growing quickly, yet only around eight percent of its own customers had agreed to any version, which is the sound of a market declining to answer a question the seller could not answer first.

None of this is version churn. A faster sidebar is a product change. Rewriting the unit of sale four times in eighteen months, while cutting the headline and complicating the meter in the same quarter, is an industry trying to discover in public what the thing it sells actually is. The collective answer, delivered with the straight face of a pricing page, is that nobody yet knows. A normal industry changes its prices when the product matures. This one changes its prices because the product is not yet sure what it is.

The Chinese Turn

There is an older version of this story, and it ran through container ports rather than data centers. For two generations the reflex for defending a margin was to send the work East, where it could be done for less, and an entire architecture of global trade was built on that one arbitrage. The generative era has revived the reflex for a different reason. The constraint is no longer cheap hands. It is cheap thought.

The premise underneath it is unglamorous and correct. Most tasks do not need the newest, most capable, most expensive model. A great deal of routine software work runs perfectly well on a cheaper engine, and the Chinese models have become precisely that engine. Built under export controls, trained and served on more mature silicon without the latest interconnects or the most abundant high-bandwidth memory, they were pushed toward thrift by the very constraints meant to hold them back. Scarcity did not make them weak. It made them frugal, which is worse news for anyone selling intelligence by the spoonful.

When the meter is running, good enough and cheap beats excellent and dear, and that single fact is quietly reorganizing the application layer. The hardest question in software is no longer what the product can do. It is which engine to call for which task, and how little can be spent clearing the bar. The vendor becomes less a cathedral builder than a dispatcher, choosing which machine takes the job before the customer notices a choice was made. Much to the chagrin of policymakers still fighting the previous war over trade balances and tariffs, the cost gravity points one way, and the invoice keeps winning arguments the speeches cannot.

The turn is no longer fringe. Cursor built its in-house model on a Chinese open foundation and was caught only when a developer read the model's own identifier in its traffic, an executive later conceding it had been a miss not to say so from the start. The embarrassment is the instructive part. The product was sold as homegrown. The cost structure remembered otherwise. The most uncomfortable case sits at the largest American software company of all, reportedly weighing a self-hosted Chinese model inside its enterprise agent to keep the price tolerable, a sentence that contains the whole decade: an American giant may need a Chinese model, wrapped in an American cloud, to make an American product affordable.

The irony writes itself, though it would surely prefer a cheaper model to do the drafting. We spent a decade fearing that software would be disintermediated by intelligence. We are watching it re-intermediate itself instead, as a buyer and broker of the cheapest adequate intelligence it can source. The application layer is not dying quietly. It is becoming a routing layer, a bargaining layer, a place where the margin is defended one model choice at a time. Its new advantage is not owning the smartest model. It is knowing when not to reach for it.

There is so much more to say on this topic. We are sure to have full-blown pieces on the subject in the coming months.

The Cost of Repricing Trust

As I write this section, with active interventions from my Claude Cowork, I overhear my partner Venky complaining about how his Claude has again stopped working for being overloaded. I have no heart to flaunt the privileges of my higher tier access. In every group, token inequality is causing new relationship dynamics.

A price is also a promise, and an industry rewriting its prices every quarter is rewriting its promises at the same pace. On every side of the software company at once, customers, employees, and the people who fund it, the old understandings are loosening, and the evidence is piling up faster than the commentary.

The customer relationship is fraying first because that is where the meter is most visible. When GitHub moved Copilot to token billing in June, developers reported burning through monthly allowances in hours and called the change a joke in the company's own forums. When Cursor restructured its pricing in 2025, the founder conceded the rollout was not communicated clearly and offered refunds. When Google tightened Gemini's limits, the complaints were public enough that it walked several of them back within weeks. The pattern repeats so often it has become a sequence: ship a price, absorb the revolt, retreat. Refunds, grace periods, restored limits, and paused signups are now routine release notes rather than rare climbdowns, which is a fair measure of how often the first guess is wrong.

The complaints have also changed in kind. Customers used to dispute the size of a bill. Now they dispute its legibility, since credits, token counts, and silent model swaps make a charge hard to predict in advance and harder to verify after. In June that distrust reached a courtroom, with a proposed class action alleging that a premium AI plan sold as a multiple of usage delivered materially less than advertised and did not disclose the limits clearly. Whether the specific claim holds is for the lawyers. The signal is that pricing opacity has moved from a forum grievance to a legal one.

The strain runs inward, into territory pricing never used to touch. The request for more compute has become something escalated to finance, and access to the better model is now a managed resource rather than a default. Clearly, who has access and who does not is giving rise to new hierarchies within organizations.

Capital feels it from the other direction, and the friction is sharpest in private hands. The investor who backed software for its near-perfect gross margin now owns something that no longer behaves that way. ICONIQ's 2026 survey found AI product builders expecting average gross margins near 52 percent, against the 70 to 80 percent that defined the category, which means the thing being funded has quietly changed shape after the check was written. The conversations have changed with it. Founders report sailing through the first meeting and then being held over the coals in the second on inference cost, gross margin, and contract length, with later rounds stretching out and bridge extensions becoming the default while backers wait for proof the unit economics work. The awkward part is that the spend keeps climbing for reasons the original plan did not budget for.

Techflation: Tech Inflation or Tech Conflation?

Three years ago, when we first wrote about the pressure on the software subscription model, the cause was singular and of a different kind. It lay in the changing nature of software development itself. That argument is well understood now, and most observers have taken a firm position on one side of it or the other.

Two further causes have since been added to the pile. The first we set out in our tollbooth article, which noted the growing absurdity of charging by the human seat when autonomous agents do the bulk of the work across the server. The second, the subject of this piece, is the more serious. Software companies have quietly taken compute onto their own books, wrapping pure hardware cost inside a digital interface and discovering the unfamiliar discomfort of a variable input. A business built on the miracle of near-zero marginal cost does not easily survive the arrival of a bill of materials.

Management teams accordingly find themselves on unfamiliar ground, without the predictability that once defined their quarters. The work is no longer only about designing good software. It is also about sourcing the intelligence underneath it at a price that leaves a margin, and the choices are genuinely hard. Whether to lean on cheaper open-weight models or pay for the frontier. How far to commit to a single cloud, at a moment when inference is beginning to diverge between providers because each routes it through its own custom silicon, so that a model's behavior and cost now depend on whose chips it runs on. That every competitor faces the same dilemma is not a comfort but a complication, since each rival's move forces a response, and the responses feed the very uncertainty everyone is trying to escape. The lack of visibility that long defined the hardware business has arrived in software, in its own form, and the people running these companies are navigating it the way one drives through fog, by watching the taillights of the car ahead.

None of this retires the deeper forces still rewiring the application layer. From the management of people to the durability of a moat, the difficulties are shared across every company that built its business on top of someone else's models. Among them, the management of compute cost, the token budget, is becoming the one that organizes all the others. It shapes not only profit but the speed of product development and the temper of the team. Access to the better model is now something to be allocated, and where it is allocated quietly redraws the hierarchy inside a firm. Two years ago, the idea that office politics might turn on who holds which API key would have been unintelligible. It is now a normal feature of working life, and a small but real instance of the token inequality we have written about before.

For the many companies still building, still spending more than they earn, the new cost line is a heavy charge levied well before the older questions about shifting capability have been settled, and before business plans that keep getting rewritten by each advance in capability have had time to set. For the analysts who cover the great listed software franchises, the temptation will be to watch the topline alone, reassured by revenue that hardware bundling helps inflate, and to leave the cost base for another day. That would be a mistake. The unraveling of the SaaS pricing model, the arrival of a gross margin where there was never one to speak of, and the unbundling of the features that margin used to conceal, have more of their consequences ahead of them than behind. The refinery has only just read the meter.