DeepSeek makes a 75% price cut permanent — and turns cheap inference into a weapon
DeepSeek's flagship V4-Pro now costs a quarter of last week's price — and a fraction of what US labs charge. Permanent, not promotional. The AI price war just reset the floor for inference, right as OpenAI and Anthropic price their IPOs on the opposite assumption.
DeepSeek has made its 75% discount on V4-Pro permanent. As of May 31, the Chinese lab's flagship model is priced at $0.435 per million input tokens and $0.87 per million output — down from list prices of $1.74 and $3.48 — paired with a further 90% cut to cached-input pricing across its API. What began as a limited promotion is now the standing rate, and it resets the floor for what frontier-grade inference costs.
The number that matters
Strip away the announcement language and the move is simple: DeepSeek just told every developer that a capable model now costs roughly a quarter of what it did a week ago, and a fraction of what the American labs charge for comparable work. Cached inputs — the repeated context that dominates real production workloads — drop to fractions of a cent per million tokens. For an enterprise pushing billions of tokens a month, that's the difference between a rounding error and a real line item, measured in millions of dollars a year.
This isn't a loss-leader stunt that expires. Making it permanent is the point: DeepSeek is converting price from a tactic into a structural position, the same way budget airlines made "cheapest seat wins" the whole business rather than a sale.
Our read
The price war is the strategy, and it's aimed squarely at the most expensive assumption in AI: that frontier capability commands frontier pricing. Chinese labs have decided the fastest way to win developers is to make inference so cheap that the decision stops being about the model and starts being about the bill. Commoditize the layer your rivals depend on for margin, and you reset the market on your terms.
The timing is the tell. OpenAI and Anthropic are walking toward public listings priced on the premise that they can hold both market share and pricing power — the OpenAI IPO math and Anthropic's first operating profit both lean on the idea that inference stays a premium good. A permanent 75% cut from a credible competitor is an argument that it won't. Every step down DeepSeek takes is a question mark over those forward revenue multiples.
Here's the catch, and it's the thing the price tag can't fix: cheap only wins where trust isn't the product. Plenty of enterprises will keep paying a premium for governed data, predictable reliability, support, and a vendor their legal team is comfortable with — especially one not headquartered in Shenzhen. DeepSeek's pricing pressures the commodity tier of inference, the high-volume, low-stakes calls. It does much less to the high-trust workloads where switching cost and accountability matter more than the per-token rate. The real question is how big each tier turns out to be — because that split decides whether the American labs' margins are defensible or merely temporary.
Watch two things. First, whether OpenAI, Google, and Anthropic respond with their own cuts or hold the line and cede the price-sensitive market — each choice tells you how much pricing power they think they actually have. Second, whether "good enough and 75% cheaper" is enough to move serious production traffic, or whether the gap between a demo and a dependency keeps enterprises paying up. The cost of intelligence is falling faster than almost anyone modeled a year ago. The companies whose valuations assume otherwise are the ones to watch.