How to budget for AI spend
Budgeting for LLM spend feels harder than budgeting for SaaS because it is. A SaaS line item is a fixed number you negotiate once a year. LLM spend is variable, per-request, and driven by code paths that change every week. But the discipline is not exotic — it is the same envelope-and-forecast loop finance already runs for cloud compute, applied to a different cost driver. This is a starter framework for teams setting their first real AI budget.
Why a flat number doesn't work
The instinct is to set one company-wide AI budget — "we'll spend $40K a month on OpenAI" — and watch the total. It fails for the same reason a single cloud budget fails: when the number is breached, you have no idea which workload caused it, so you have no idea what to cut. A useful AI budget is not one number. It is a set of envelopes, one per workload, each with a named owner.
Step 1: Attribute before you budget
You cannot budget spend you cannot trace. The prerequisite for every step below is cost attribution — the ability to see spend broken out by feature, team, environment, and customer. A budget set on unallocated spend is a guess, and a guess cannot be enforced. If you do nothing else first, get attribution working.
Step 2: Forecast from token shapes, not last month's total
To project next quarter's spend, build it up from the cost drivers rather than extrapolating the invoice. For each significant workload, estimate:
- Request volume — how many calls per month, and how it tracks with a business driver (active users, transactions, documents processed).
- Token shape — average input, output, cache-read, and reasoning tokens per request. This is where the cost actually lives. (See how providers charge.)
- Model mix — which model tiers handle the traffic, since the per-token price varies by more than 10× across the ladder.
Multiply those out and you have a forecast that responds to reality: if volume doubles, you know the cost; if a feature moves to a cheaper model, you can see the saving before you ship it. Tie the forecast to a business metric so finance can sanity-check it against the plan.
Step 3: Set an envelope per workload
Give each workload a monthly envelope derived from the forecast plus a deliberate margin for growth. The envelope is not aspirational — it is the number the owning team is accountable to. Express it in dollars, but track it in the underlying drivers (cost per request × volume) so a breach can be diagnosed, not just noticed.
Step 4: Choose soft and hard thresholds
A budget without a response to a breach is just a chart. Define two thresholds per envelope:
- Soft threshold (e.g. 80% of envelope) — alert the owning team and trigger a review. Nothing changes automatically; a human decides.
- Hard threshold (e.g. 100%) — the system degrades behaviour rather than letting spend run unbounded: drop to a cheaper model, reduce retrieval depth, move non-urgent work to a batch API, or queue lower-priority traffic.
The point of the hard threshold is not to punish the team that exceeded it. It is to make sure a runaway cost — a broken cache, a spike, an agent loop — is bounded by design instead of discovered at month-end.
Step 5: Reconcile, then adjust
At month-end, compare your internal estimate to the actual provider invoice. Reconciliation does two things: it keeps your forecast honest (a recurring variance usually means a stale price assumption or a misclassified token type), and it feeds the next cycle's envelopes with real numbers. A budget that is never reconciled drifts away from the invoice until no one trusts it.
Step 6: Give the budget an owner at three levels
Budgets fail organizationally more often than technically. The pattern that holds up has three explicit roles:
- A finance owner (FP&A or a FinOps lead) who owns the model, the forecast, and the monthly close.
- A platform owner (the engineer who runs the gateway) who owns the tagging that makes attribution reliable and enforces the hard thresholds.
- An executive sponsor (CFO or CTO) who chairs the review and resolves disputes when a team wants more envelope.
Smaller teams collapse the first two roles into one FinOps-minded engineer, but the sponsor is non-negotiable — without it, budget breaches become political fights instead of operational decisions.
A realistic first 30 days
- Week 1: get attribution working — tag traffic by feature and environment, even crudely.
- Week 2: measure current token shapes and model mix per workload; build the bottom-up forecast.
- Week 3: set envelopes and soft thresholds; wire up alerts.
- Week 4: add hard-threshold degradation on the highest-spend workloads; run a first reconciliation against the latest invoice.
That is enough to turn "we'll find out at month-end" into "we know where we are today, and the worst case is bounded." Everything after that is refinement.
Related
- LLM budget governance — budgets, quotas, and degradation paths in depth.
- What is LLM cost attribution? — the prerequisite for any budget.
- How do LLM providers charge? — the inputs to a bottom-up forecast.
- Invoice reconciliation for AI bills — keeping the budget tied to reality.
Want this applied to your own LLM spend? FinOps LLM runs a free audit of your AI costs and shows where the savings are. Book free audit →