Calculate your actual gross margin per customer and Inference Efficiency Ratio (IER). Run three scenarios to find the optimization ceiling.
| IER | Gross Margin | Status | Action |
|---|---|---|---|
| <15% | >60% | Healthy | Maintain, scale with confidence |
| 15-23% | 50-60% | Acceptable | Monitor, implement caching if not done |
| 23-30% | 40-50% | Caution | Implement prompt caching + model routing immediately |
| >30% | <40% | Critical | Pricing or cost structure requires intervention |
Same plan price, but use the top-10% customer's inference cost.
Scenario A inference x 0.15 (90% input reduction from caching x best-case routing).