How to budget for AI spend

Budgeting for LLM spend feels harder than budgeting for SaaS because it is. A SaaS line item is a fixed number you negotiate once a year. LLM spend is variable, per-request, and driven by code paths that change every week. But the discipline is not exotic — it is the same envelope-and-forecast loop finance already runs for cloud compute, applied to a different cost driver. This is a starter framework for teams setting their first real AI budget.

Why a flat number doesn't work

The instinct is to set one company-wide AI budget — "we'll spend $40K a month on OpenAI" — and watch the total. It fails for the same reason a single cloud budget fails: when the number is breached, you have no idea which workload caused it, so you have no idea what to cut. A useful AI budget is not one number. It is a set of envelopes, one per workload, each with a named owner.

Step 1: Attribute before you budget

You cannot budget spend you cannot trace. The prerequisite for every step below is cost attribution — the ability to see spend broken out by feature, team, environment, and customer. A budget set on unallocated spend is a guess, and a guess cannot be enforced. If you do nothing else first, get attribution working.

Step 2: Forecast from token shapes, not last month's total

To project next quarter's spend, build it up from the cost drivers rather than extrapolating the invoice. For each significant workload, estimate:

Multiply those out and you have a forecast that responds to reality: if volume doubles, you know the cost; if a feature moves to a cheaper model, you can see the saving before you ship it. Tie the forecast to a business metric so finance can sanity-check it against the plan.

Step 3: Set an envelope per workload

Give each workload a monthly envelope derived from the forecast plus a deliberate margin for growth. The envelope is not aspirational — it is the number the owning team is accountable to. Express it in dollars, but track it in the underlying drivers (cost per request × volume) so a breach can be diagnosed, not just noticed.

Step 4: Choose soft and hard thresholds

A budget without a response to a breach is just a chart. Define two thresholds per envelope:

The point of the hard threshold is not to punish the team that exceeded it. It is to make sure a runaway cost — a broken cache, a spike, an agent loop — is bounded by design instead of discovered at month-end.

Step 5: Reconcile, then adjust

At month-end, compare your internal estimate to the actual provider invoice. Reconciliation does two things: it keeps your forecast honest (a recurring variance usually means a stale price assumption or a misclassified token type), and it feeds the next cycle's envelopes with real numbers. A budget that is never reconciled drifts away from the invoice until no one trusts it.

Step 6: Give the budget an owner at three levels

Budgets fail organizationally more often than technically. The pattern that holds up has three explicit roles:

Smaller teams collapse the first two roles into one FinOps-minded engineer, but the sponsor is non-negotiable — without it, budget breaches become political fights instead of operational decisions.

A realistic first 30 days

  1. Week 1: get attribution working — tag traffic by feature and environment, even crudely.
  2. Week 2: measure current token shapes and model mix per workload; build the bottom-up forecast.
  3. Week 3: set envelopes and soft thresholds; wire up alerts.
  4. Week 4: add hard-threshold degradation on the highest-spend workloads; run a first reconciliation against the latest invoice.

That is enough to turn "we'll find out at month-end" into "we know where we are today, and the worst case is bounded." Everything after that is refinement.

Related


Want this applied to your own LLM spend? FinOps LLM runs a free audit of your AI costs and shows where the savings are. Book free audit →

Back to research