The COGS Crisis Nobody Budgeted For — Field Notes

Here is a pattern I keep seeing: an AI feature ships, engagement goes up, the team celebrates, and six weeks later finance calls an emergency meeting because gross margins dropped four points.

Nobody budgeted for the cost of success.

In Field Notes 001 I wrote about the 3x rule — the idea that an AI feature should generate at least three times its compute cost in measurable value. That piece was about running the numbers before you build. This one is about what happens when you skip that step and ship anyway.

The margin dilution problem

Traditional software has near-zero marginal cost. You build it once, you serve it to a million users, and COGS barely moves. AI broke that model. Every inference costs money. Every token has a price. And that price scales linearly with usage.

When a B2B SaaS product adds an AI-powered feature — summarization, chat, automated analysis — it is adding a variable cost line that did not exist before. If that feature is bundled into the existing subscription price, you are subsidizing every interaction. The more users love it, the more it costs you.

I have watched this play out in real time. A platform ships "AI-assisted search" as a differentiator. Adoption is strong — thirty percent of active users within the first month. Leadership is thrilled. Then the AWS bill arrives. Per-user AI cost is eighteen dollars a month. The average subscription is twenty-five dollars. The feature that was supposed to drive retention is eroding the unit economics that make retention worth having.

Why teams miss this

Three reasons. First, most product teams do not have visibility into per-feature compute costs. They see an aggregate cloud bill. They do not see that Feature X costs twelve cents per interaction while Feature Y costs three cents. Without that granularity, you cannot make informed decisions about what to build, what to optimize, and what to kill.

Second, the cost curve moves. Model prices drop. But usage patterns shift too. A feature that costs two cents per query at launch might cost eight cents per query once users figure out they can chain five follow-up questions. The optimistic projection from the proof of concept rarely survives contact with real behavior.

Third, nobody owns it. Product owns engagement. Engineering owns infrastructure. Finance owns the P&L. But nobody owns the unit economics of a specific AI feature end to end. That gap is where margin dilution lives.

The AI Product Operator

The teams that are getting this right have created a function I have started calling the AI Product Operator. It is not a new hire. It is a responsibility — sometimes held by a product manager, sometimes by a staff engineer, sometimes by a dedicated role. The job is to make the unit economics of every prompt visible and defensible to the CFO.

That means: cost per interaction, cost per user per month, cost per feature, and the value each of those costs generates. Not as a quarterly report. As a live dashboard that product and finance review together.

When you have that visibility, decisions get sharper. You can see that Feature A costs nine cents per interaction and drives fourteen percent higher retention. Worth it. Feature B costs eleven cents per interaction and has no measurable impact on any business metric. Kill it. Feature C costs twenty-two cents per interaction but only for power users who pay three times the average subscription. Segment it.

Circuit breakers for cost

Visibility is one half. The other half is protection. If you are running agentic workflows — systems where an AI agent can execute multi-step tasks autonomously — you need circuit breakers that trigger on cost, not just errors.

A traditional circuit breaker stops execution when the error rate spikes. An agentic circuit breaker stops execution when the spend threshold is hit. I recommend two tiers:

At ten steps or fifty cents of compute, the agent pauses and surfaces a partial progress report. The user decides whether to continue. This is your soft limit — it catches runaway loops before they get expensive and keeps the human in the loop.

At twenty steps or two dollars, the system kills the process, rolls back any state changes, and logs a logic drift incident for engineering review. This is your hard limit — the non-negotiable ceiling that protects your infrastructure from a single bad reasoning chain burning through your monthly budget.

This is not theoretical. I have seen agentic systems burn through hundreds of dollars in a single runaway loop because nobody built a cost-based kill switch. The retry logic worked perfectly. It just retried a failing operation forty-seven times at a dollar twenty per attempt.

Where the Intent Stack fits

Gates one and two of the Intent Stack exist precisely for this. Gate one asks: what will this cost? Not in the abstract. Per interaction, per user, per month, at projected scale. Gate two asks: should we build this at all? That question is unanswerable without the cost model from gate one.

Most teams skip straight to gate three — what exactly are we building? — and figure out the economics later. "Later" usually means "after finance notices."

The margin dilution crisis is not an AI problem. It is a sequencing problem. You are doing the right work in the wrong order. Run the economics first. Make every prompt defensible. Then build.

The most dangerous AI feature is the one that works beautifully and costs more than it earns.