There’s an old rule in construction – measure twice, cut once.
I like that principle because it’s simple, and it applies to AI just as much as it applies to lumber.
Recently, I was scrolling through LinkedIn, and right now, of course, AI is everywhere. Every other post is someone trying to sound profound about agents, tokens, automation, productivity, or the future of software engineering.
Some of it is useful, but a lot of it is noise.
And then I ran across a clip that made me stop…
It was from Jensen Huang, the founder and CEO of Nvidia. So, not some random guy trying to farm engagement. This is the guy running one of the most important companies in the AI infrastructure race, and Nvidia’s chips are a huge part of what powers the AI economy right now.
In the clip, Huang talked about a software engineer or AI researcher making $500,000 a year, and said he would be “deeply alarmed” if that person didn’t consume at least $250,000 worth of AI tokens in a year. The quote was covered by Tom’s Hardware, and TechCrunch also covered the larger idea of tokens becoming part of engineering compensation.
My first reaction was not exactly gentle – hard pass.
Because I think this is where the AI conversation starts getting dangerous. Not because AI is bad, and not because companies should avoid spending money on AI. That’s not my argument.
My argument is this: Stop paying premium prices for messy thinking.
Token spend is not the scoreboard
There’s a difference between using a lot of AI because you’re doing a lot of valuable work, and using a lot of AI because your process is sloppy.
The problem is that those two things can look the same on a token bill, and there are plenty of people who use token usage as a measure of performance. It’s almost like a game to see who can use the most tokens.
But token use does not reflect how much work actually gets done.
If an engineer burns through $250,000 in tokens and produces $5 million in useful output, fine. That’s a business conversation. But if an engineer burns through $250,000 in tokens because every task turns into a wandering swarm of agents, retries, vague prompts, bloated context, and “try again” loops, that’s not productivity.
That’s just expensive confusion.
The metric shouldn’t be how many tokens you used – it should be how much useful work got done with the tokens you did use.
In a moment I’ll share my strategy for getting more work done with less token use, but first I need to address something important – the market and the economics.
The incentives matter
This is also where people need to pay attention to incentives.
The AI economy is being built around compute, inference, tokens, agents, and usage, and a lot of companies in this space have massive valuations, huge revenue projections, and incredible momentum.
But to date, no major AI frontier lab has publicly demonstrated sustained profitability, despite billions in annualized revenue.
That word publicly matters. Some of these labs are private, and some sit inside larger profitable tech companies. But the labs themselves have not shown the market a durable, repeatable profit profile yet. Even The Atlantic recently noted that flagship AI companies like OpenAI and Anthropic are bringing in billions in annualized revenue, but are not yet profitable.
This isn’t me saying AI is fake. I’m very obviously not anti-AI.
What I’m saying is that the business model matters.
Epoch AI estimated that, across three AI companies where it could make estimates, R&D and inference compute together made up 54% to 62% of costs. In other words, compute is not some minor side expense. It’s the engine room of the whole thing.
So when the industry starts treating token burn like a badge of honor, I think the right response is to slow down and ask a better question:
Who benefits when token usage becomes the measure of productivity?
The better way to think about it
I am not saying, “Don’t spend money on AI.” I am saying, “Stop paying premium prices for messy thinking.”
That’s a completely different argument.
AI should absolutely increase your output. I use it every day. I use agents, I use browser-based AI, and I use coding assistants. I use it for planning, writing, development, debugging, research, and workflow design.
In fact, I am an AI systems architect who has built a cognitive intelligence framework that works above the LLM model. I am very pro-AI.
My point is that the more experienced you become, the more intentional your usage should become.
You should be building systems, you should be reusing workflows, you should be tightening prompts, you should be learning where agents waste time, you should be reducing context bloat, and you should be getting more precise, not more chaotic.
That means, over time, the amount of useful work per token should improve.
If your AI usage keeps getting more expensive but your workflow is not getting cleaner, that’s not maturity. That’s drift.
Where the tokens actually go
Most token waste doesn’t come from one big task – it comes from the retry loop.
It’s like this:
- You start with a vague prompt.
- The model gives you something close, but not quite right, so you correct it.
- Then you remember a constraint you forgot to mention.
- Then you change the format.
- Then you realize the original goal was not clear enough.
- Then the agent runs again.
- Then another agent checks it.
- Then another one rewrites it.
And it feels like progress because things are happening, but behind the scenes, you’re paying the AI to think through the same mess repeatedly.
That’s where the cost hides.
It’s not in the execution. It’s in the unclear iteration.
The strategy I actually use
One of the simplest ways I cut down on token waste is this:
I do the messy thinking in the browser first, then send the agent a cleaner prompt.
That sounds small, but it changes the whole workflow.
If I have an idea that is still forming, I don’t immediately throw an expensive agent at it. I talk it out first. I brainstorm. I let the idea take shape in a lower-friction environment. I ask questions. I push on the weak spots. I figure out what I actually mean.
Then, once the idea is clearer, I have the browser session help me turn it into an agent-ready prompt.
That prompt usually includes:
- the exact outcome
- the relevant context
- the constraints
- what to avoid
- the expected output format
- where the agent should be decisive
- where it should ask before acting
Now the agent is not being paid to wander around the idea with me. It’s being paid to execute from a clearer brief.
That’s the difference.
Browser first, agent second
For me, the workflow usually looks like this:
- Brainstorm in browser.
Use the conversation to shape the idea, test the angle, and get the messy thinking out. - Turn the idea into a prompt.
Have the browser AI create a tight instruction set for the agent. - Send the agent one complete brief.
Give it context, constraints, the outcome, and the expected format up front. - Review once, then refine.
Don’t let the workflow turn into an endless correction loop. - Save what worked.
If the prompt or process worked, turn it into a reusable workflow so the next run is cheaper and cleaner.
That’s how you start getting more done with fewer tokens: Not by avoiding AI, but by using the right AI layer for the right stage of the work.
The 5-step method before a serious AI run
Before I run a serious prompt, spin up an agent, or start building a workflow, I try to get five things clear.
- Define the outcome.
What exactly needs to be done? - List the components.
What sections, files, steps, assets, data, or requirements are involved? - Clarify the behavior.
What should it do, and just as important, what should it avoid? - Structure the output.
What should the final answer, file, report, page, or workflow look like? - Build one complete prompt.
Then run it with enough context for the AI to execute cleanly.
This isn’t complicated, but it is the part people skip because the AI makes skipping it feel painless.
… at least until the bill shows up.
The real test
The question is not, “How many tokens did you burn?“
The question is: What did those tokens produce?
Did they create something useful?
Did they save time?
Did they reduce rework?
Did they improve the system?
Did they move the business forward?
Did they create a repeatable process you can use again?
If yes, great. Spend the tokens.
If not, you’re not investing in AI. You’re paying for motion.
One final thought
I’m not impressed by big token numbers. I’m impressed by clean workflows, clear systems, repeatable output, and measurable results.
AI isn’t expensive because it’s powerful. It gets expensive when it’s unfocused.
So before you run the prompt, spin up the agent, or start the workflow, take a minute and get clear.
Measure twice, then let AI cut once.
Quick Q&A: how to reduce AI token cost
If you came here looking for the practical version, this is the short answer.
How do you reduce AI token cost?
The fastest way to reduce AI token cost is to stop using expensive runs for unclear thinking. Define the outcome first, trim unnecessary context, give the AI complete instructions up front, and reuse prompts or workflows that already worked.
What causes high AI token usage?
High AI token usage usually comes from vague prompts, oversized context, repeated corrections, agents re-reading the same material, and workflows where the model has to rediscover the goal over and over again.
Should you avoid AI agents to save tokens?
No. AI agents can be worth the cost when the task is clear and the output matters. The mistake is using an agent as the brainstorming room, the project manager, the researcher, the editor, and the executor all at once before you know what you actually want.
What is token efficiency?
Token efficiency is the amount of useful work you get from the tokens you spend. A smaller token bill is not always better, and a bigger token bill is not always worse. The question is whether the spend produced something valuable, repeatable, or measurable.
What is the best prompt structure for lowering token waste?
Use a prompt that includes the outcome, context, constraints, output format, success criteria, and anything the AI should avoid. That one complete brief is usually cheaper than five vague prompts and five rounds of corrections.
