A few weeks ago I ran the numbers on the token cost panic. I took the scariest figure in legal AI, the finding that agentic workflows burn a thousand times more tokens than a chat query, and followed it all the way down to a dollar amount on a real deal. The panic did not survive the arithmetic. The piece is here if you want the full walk-through.
This is not that piece. The panic has moved on since I wrote it, and the new versions are smarter than the old one. The thousand-times number has quietly retired, because a thousand times almost nothing is still almost nothing. In its place are three fresher anxieties, and they deserve a real answer. The first says the model makers have a monopoly now, the price of a token is climbing, and it will climb forever, so you had better lock in a flat rate or build your own models before it does. The second says forget the price of a token, watch the meter: every time the AI reads your contract it ticks, and a long agentic session reads your contract over and over and over. The third does not bother with an argument at all. It just points at a number. One company spent five hundred million dollars on AI in a single month, and the number is so large it does the panicking for you.
All three are wrong. They are wrong in more interesting ways than the original, which is the only reason I am writing this down instead of linking to the first piece again. But underneath the new costumes it is the same body. Every version of this panic makes the same mistake and reaches the same conclusion. So let us stop swatting the individual numbers and name the thing that keeps generating them.
The Mistake Underneath All of It
Here is the error, stated once, because everything below is a variation on it.
A token is the unit a model uses to bill you. It is not the unit your work is measured in, it is not the unit your client pays for, and it is not the unit anything you care about is denominated in. It is a meter reading. The entire genre of token panic consists of staring at the meter reading as though it were the fare, the destination, and the quality of the ride all at once.
It is not any of those things. It is the meter. And a meter, by itself, tells you nothing about whether you are getting a good deal. A taxi meter reading of forty dollars is a bargain to the airport and a robbery around the block. The number on the meter is the least informative number in the entire transaction, because it means nothing until you put it next to what the ride was worth. Every piece in this genre forgets that, and forgets it in a slightly different way. Let me take them in turn.
“Prices Only Go Up”
Start with the monopoly story, because it has a real fact inside it. Yes, the newest frontier model costs more per token than last year’s newest model. That part is true. What the story does with it is the problem.
It draws a line through two dots and calls it a trend. Frontier prices up, therefore prices up forever, therefore lock in a flat rate before the meter eats you. But you are watching the wrong number. The price of a frontier token is not your cost. Your cost is what it takes to finish a task, and the cost of finishing a given task has been in freefall for two straight years. The same capability that ran on the most expensive model available in 2022 runs today on something on the order of two hundred and eighty times cheaper. Last year’s frontier is this year’s mid-tier is next year’s free default. The token at the very tip of the frontier gets a little pricier each release; everything behind the tip collapses in price behind it. Gartner expects another ninety percent drop in inference cost by 2030.
Watching the frontier price and concluding that AI is getting more expensive is reading the thermometer and announcing a fever, while ignoring that you are holding the thermometer over a candle. The evidence that the baseline is getting cheaper often sits right there in the same articles raising the alarm, quoted from the experts and then left unaddressed. You do not build a cost strategy on the one number in the system that is engineered to always be the highest.




There is a growing chorus of voices in legal AI telling you to be very, very worried about the cost of tokens. 









