Provider arbitrage
Provider arbitrage means buying equivalent model capability from the cheapest provider that can actually deliver it. It is one of the most direct ways to reduce LLM spend, but it only works when you separate “same model name” from “same useful output.” A provider may offer a lower nominal price and still end up costing more once latency, error rates, routing overhead, and quality regression are included.
The basic idea is simple: identify workloads that do not depend on the proprietary behavior of one provider, benchmark equivalent models across vendors, and move stable traffic to the better price-performance option. The hard part is proving equivalence. Many teams discover that their workload is not the same after translation because system prompts, tool schemas, safety filters, or output formatting differ.
What to compare
Compare price, quality, latency, reliability, and operational complexity. Price alone is not enough. A cheap provider with poor uptime or poor routing support may increase retries and hidden engineering cost. A slightly more expensive provider with better throughput and fewer edge cases can be cheaper in the final unit economics. Equivalent capability should be measured on the task you actually run, not on a generic benchmark alone.
For open-weight models, compare hosted options across providers and regions. For closed models, compare the models themselves and the infra around them. In both cases, include egress, logging, and gateway overhead in the comparison. A model that is 15% cheaper at the token level but requires a complicated adapter layer may not be worth switching to unless the spend is material.
When arbitrage works best
Arbitrage works best on stable, high-volume traffic with well-defined output structure. Extraction, classification, templated summarization, and background enrichment are good candidates. Highly creative or brand-sensitive generation usually needs more care. The more repeatable the task and the clearer the acceptance criteria, the more attractive the swap.
Migration discipline
Do not move all traffic at once. Start with a narrow endpoint and a strict rollback plan. Run both providers on the same evaluation set, then on a small percentage of live traffic. Track quality deltas and failure modes. Some teams save money by moving the first 70% of traffic and keeping the remaining 30% on the original provider because of edge-case behavior. That is fine if the blended cost still improves.
Provider arbitrage is most sustainable when it is managed as a normal engineering program rather than a one-time procurement swap. Prices drift, model versions change, and provider quality changes over time. Re-run the benchmark periodically and keep the route policy fresh.