Tokenmaxxing — Field Notes

There is a new pathology spreading through engineering organizations. I have started hearing people call it tokenmaxxing: the practice of maximizing AI token consumption to signal productivity to leadership.

It works like this. An engineer gets evaluated partly on how much they are "leveraging AI." So they route everything through the largest model available. Code reviews, documentation, boilerplate, things that do not need a frontier model and never did. The token count goes up. The dashboard looks impressive. The actual output is indistinguishable from what a smaller model or no model at all would have produced.

CFOs are starting to notice. AI compute spend is climbing, but the productivity metrics that were supposed to justify it are flat. The gap between cost and value is widening, and it is widening because the incentive structure rewards consumption over precision.

The inverse move

The most valuable engineers I work with in 2026 are doing the opposite. They are practicing what I call inference minimization — achieving the same outcome with the fewest tokens possible.

That means reaching for a 128-token call to a small language model before reaching for a 10,000-token prompt to a frontier model. It means knowing when the right tool is a regex, a SQL query, or a deterministic script instead of an LLM. It means treating every inference call as a cost decision, not a convenience.

This is not about being cheap. It is about being precise. A well-scoped prompt to a small model that solves the problem in one pass is worth more than a sprawling conversation with a frontier model that takes five rounds to land on the same answer.

Why this matters at the product level

In Field Notes 002 I wrote about the COGS crisis — AI features that lift engagement while crushing margins. Tokenmaxxing is the same disease at the engineering layer instead of the product layer. It inflates costs without inflating value.

When your engineers are tokenmaxxing, your internal AI costs grow linearly with headcount. That is the exact scaling curve you are supposed to be avoiding by using AI in the first place. You wanted AI to make ten engineers as productive as twenty. Instead you made ten engineers as expensive as twenty.

The fix is not to restrict access. It is to change what you measure. Stop measuring token consumption. Start measuring outcome per token. How many tasks completed per dollar of compute? How many production issues resolved per inference call? What is the cost-per-resolution of your AI-assisted support pipeline?

The right model for the job

Most organizations default to a single model for everything. That is like using a sledgehammer to hang a picture frame. Different tasks have different precision requirements and different cost profiles.

Classification, routing, and extraction tasks rarely need a frontier model. A fine-tuned small model handles them faster, cheaper, and often more reliably. Structured data transformation is almost always better served by deterministic code than by an LLM. Natural language generation, complex reasoning, and ambiguous tasks — that is where the frontier model earns its cost.

The engineers who understand this taxonomy — who know which tool to reach for before they reach for anything — are the ones whose teams have sustainable AI economics. Everyone else is paying frontier prices for commodity work.

The cost discipline mindset

This connects directly to gates one and two of the Intent Stack. Before you build, you ask: what will this cost, and should we build it at all? Those questions apply at every level — feature level, architecture level, and individual prompt level.

The teams that will win in the next two years are not the ones with the biggest AI budgets. They are the ones with the tightest cost discipline. They know what every inference costs, they know what value it generates, and they can defend both numbers to anyone who asks.

The best prompt is the one you did not have to send.

Tokenmaxxing is a symptom. The disease is measuring effort instead of outcome. Cure the measurement problem and the token problem solves itself.