LLM token tracking

LLM token tracking is the foundation for AI cost governance. If you cannot separate input tokens, output tokens, cached tokens, retries, and fallback calls, you cannot explain why the bill changed or which engineering choice caused it.

Token tracking should happen at request level, not only as a daily aggregate. Teams need enough detail to answer whether cost increased because users asked more questions, because the product sent more context, because a route policy changed, or because a bug triggered duplicate work.

Minimum fields to capture

Timestamp, request id, feature, environment, and team owner.
Provider, model, route policy, and fallback status.
Input and output token counts, plus any cache-read or cache-write token classes.
Latency, retries, and final status.
Business context such as tenant, workflow, or successful-task marker where allowed.

Why token counts alone are not enough

Token counts explain volume, but not value. A workflow with high token usage may still be efficient if it closes expensive support cases or automates a finance process. That is why strong teams track both token consumption and business outcome. The more useful metric is cost per successful task, with token data as the diagnostic layer underneath.

Tracking patterns that help in production

Join application logs, model gateway data, and invoice totals into a single reporting model. Keep retries linked to the original request. Separate user-facing traffic from batch and eval jobs. Most importantly, do not let each provider define your reporting shape. Normalize records so multi-provider routing can still be analyzed in one view.

Back to research