Who this is for: Founders building or operating AI-enabled SaaS or service products who have not yet built a per-customer P&L model, or who are surprised by lower-than-expected gross margins despite strong top-line revenue.
The problem
Traditional SaaS runs at 75 to 85% gross margins. These margins are so well-established that SaaS investors, lenders, and valuation models are calibrated around them. When AI products land at 52% gross margins, the product appears to be underperforming, even when it is operating normally.
The problem is not underperformance. The problem is that AI products carry a cost that traditional SaaS never had: inference. Inference is the compute consumed every time the AI processes a request. Tokens in, tokens out, multiplied by every customer interaction, every day. This cost scales with usage, not seats.
Most founders underestimate this cost during pricing. The result is a product that looks profitable at the top line and leaks cash at the customer level. A problem that compounds invisibly until margins are checked.
Where Your Margin Actually Goes
The 52% gross margin baseline
AI SaaS gross margins average 52% as of 2026. This compares to 75 to 85% for traditional SaaS. The gap (23 to 33 percentage points) is the accumulation of the new COGS line that inference creates.
That 52% is a weighted average across many products and business models. Text-heavy AI products can reach 60 to 65% gross margins with careful cost management. Voice AI products face a harder floor: 40 to 50% at small scale due to the layered cost of the voice stack.
The 52% figure is not a failure benchmark. It is a calibration point. Building a pricing model that assumes 75% gross margins and then discovering 52% actual margins is a cash flow problem that compounds monthly.
Inference costs: approximately 23% of revenue
Inference (the compute cost of model calls) averages approximately 23% of revenue for AI products. This means: for every $100 of subscription revenue, roughly $23 goes back out the door in API fees before any other cost is counted.
This percentage depends on:
- Which model you call. At 100 calls per day per customer, a frontier model costs $75 to $150/month in inference alone. The same volume on a smaller model costs $9 to $24/month. That is a 6 to 7x difference in inference cost for the same product usage.
- Whether you have implemented cost controls. Prompt caching reduces input costs by up to 90%. Model routing (sending simple queries to smaller models and complex ones to larger) reduces overall inference spend by 45 to 85%. These are not optimizations; they are prerequisites for viable margins at scale.
- Average session length and query complexity. A product that generates short, simple outputs has fundamentally different inference economics than one that generates long-form documents or runs multi-step reasoning chains.
Why flat-rate bleeds money on heavy users
A flat-rate plan sets a fixed monthly revenue per customer regardless of usage. This creates a structural risk when customer behavior is non-uniform: the heavy users (who consume 5 to 10x the average inference) generate negative gross margin while the light users subsidize them.
The math is specific: a single 10x heavy user on a $299/month flat plan can generate -$59/month in gross profit. That means the product is actively losing money on that customer, every month, at the current plan price.
The countermeasures are concrete:
- Usage caps framed as "AI credits." Set a monthly usage threshold above which overages apply. This prevents the worst-case heavy-user drain without requiring a full model change.
- Hybrid pricing (base + overage). A flat floor provides predictable MRR; usage overages above the included allowance prevent margin destruction.
- Model routing. Moving 85% of queries to smaller, cheaper models at 95% quality retention dramatically changes the heavy-user margin calculation.
Voice AI: a harder margin floor
If your product includes voice AI, the margin challenge is more acute. The all-in cost stack for a voice AI interaction runs $0.13 to $0.45 per minute, across telephony, speech-to-text, the LLM, and text-to-speech, compared to advertised rates that focus only on the LLM layer.
| Component | Cost per minute |
|---|---|
| Telephony | $0.005 to $0.02 |
| Speech-to-text | $0.01 to $0.02 |
| Language model | $0.01 to $0.04 |
| Text-to-speech | $0.05 to $0.08 |
The TTS layer alone often exceeds the cost of the LLM. At even moderate call volumes, voice AI products face structural gross margins below 50% without active cost management.
How to apply it
- Build a per-customer P&L for your median customer and your top-10 highest-usage customers. The formula: monthly price minus inference cost minus allocated hosting minus allocated support cost equals gross profit. If gross profit on any cohort is negative, that cohort is actively burning cash.
- Calculate your Inference Efficiency Ratio (IER). IER = total monthly inference spend divided by total monthly revenue. Target: under 15%. If you are above 23%, immediate cost control intervention is warranted.
- Implement prompt caching if you have not already. This is a 1 to 2 day implementation that reduces input inference costs by up to 90%. It is the single highest-leverage cost reduction available.
- Audit your model routing. Are all queries going to your most expensive model by default? Routing simple queries to smaller models and escalating only complex ones to larger models produces 45 to 85% cost reduction.
- Add usage caps before your next pricing announcement. Frame them as "included AI credits." This is easier to implement before customers are on the plan than after.
The most expensive mistake in AI pricing is not setting the price too low. It is not knowing your per-customer gross margin when you set the price. Build the per-customer P&L model before you scale marketing.
When to use: Before setting or changing your price. Fill in the brackets with your actual API costs and usage data. The output tells you whether your pricing is structurally sound.
When to use: When your actual margins are coming in lower than your model predicted. This surfaces the cost leaks that most founders miss in their initial P&L.
When to use: When you know your margins are compressed and want to quantify which optimization to implement first. The output ranks your options by cash impact.