How to implement usage-based pricing for an AI product
Decide between per-token, per-action, and hybrid pricing. Implement quotas, refunds, and Stripe meter sync. Step-by-step with code, from a developer who has shipped all three models.
Last updated: 2026-05-10
Usage-based pricing for AI is one of the trickiest parts of monetizing an AI product. Get it wrong and you either bleed margin on power users or scare off casual ones with bills they cannot predict.
This guide covers the three viable pricing models, when each fits, and how to implement them with AIPricingLab + your billing system. We assume you are using Stripe, but the patterns generalize.
Step-by-step
1. Pick a pricing model: per-action, per-token, or hybrid
Per-action ("$X per render") is simplest to communicate; per-token ("$Y per 1M tokens") is fairest; hybrid ("$X/mo includes 100 renders, then $Y per overage") is what most mature AI products land on.
2. Define limit groups for the billable units
For per-action: unit "count". For per-token: unit "tokens" or "cents". For hybrid: stack a fixed quota group with an overage group on the same plan.
3. Reserve, call AI, commit, refund
Same atomic pattern as anywhere. The novelty for usage-based is the refund - when you reserved 4k tokens but only used 800, you must track a refund event so the user is not over-charged.
const r = await vevee.reserve(userId, "llm.tokens", 4000, { model: "gpt-4o" });
if (!r.allowed) throw new LimitError();
try {
const res = await openai.chat.completions.create(/* ... */);
const used = (res.usage?.prompt_tokens ?? 0) + (res.usage?.completion_tokens ?? 0);
await vevee.commit(r.reservationId!);
if (used < 4000) {
await vevee.track(userId, "llm.tokens.refund", 4000 - used, {
reservationId: r.reservationId!,
});
}
} catch (err) {
await vevee.release(r.reservationId!);
throw err;
}4. Sync usage to Stripe at period close
AIPricingLab tracks; Stripe invoices. At the end of each billing period (or in real time, depending on your Stripe plan setup), push the user's consumed quantity to a Stripe usage record on the relevant meter.
import Stripe from "stripe";
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
export async function syncUsageToStripe(userId: string, stripeCustomerId: string) {
const usage = await vevee.usage(userId);
const tokens = usage.counters.find(c => c.label === "Billable tokens")!;
await stripe.billing.meterEvents.create({
event_name: "llm_tokens",
payload: {
stripe_customer_id: stripeCustomerId,
value: String(tokens.count),
},
timestamp: Math.floor(Date.now() / 1000),
});
}5. Surface live spend to the user
Use a pk_live_ key in the browser to render "$3.42 used this month, $46.58 remaining on your $50 plan". Pulls from vevee.usage(userId) - safe to expose.
Margin math (the part nobody talks about)
If GPT-4o input costs $5/1M tokens to you, do NOT charge users $5/1M tokens. You need margin for: GPU latency overhead, rate-limit headroom, support cost, your gross margin target, and surprise model price changes. A 50-100% markup is normal for AI.
Pre-paid credits vs post-paid invoicing
Pre-paid: user buys $50 of credits, you decrement. Predictable revenue, no chargebacks, but blocks happy users mid-task. Post-paid: user uses, you bill. Better UX, but you carry credit risk and surprise bills are common. Most consumer AI is pre-paid; most B2B AI is post-paid; many products offer both.
Avoiding the "free tier was a mistake" trap
Free tiers for AI must have hard caps that break exactly when you intend them to. If your free tier covers 1k tokens but you allow up to 4k due to imprecise enforcement, your margin gets murdered. AIPricingLab's atomic reserve closes this.
Communicating price to the user
Variable bills make users anxious. Always show "this prompt will cost approximately X" before the call runs (use a local tokenizer for the estimate). Show "$X used this month, $Y remaining" prominently. The biggest predictor of churn for usage-based AI products is users feeling out of control.
Frequently asked questions
Should I expose token counts to my users?
Probably not. "Tokens" is a confusing internal unit. Convert to dollars or "messages" or "renders" before showing. Keep tokens in the backend for cost accounting.
How do I price for users who use 100x the average?
Hybrid: include a generous fixed quota in the plan, then meter overage. The 100x users self-select into a higher plan or pay overage; everyone else stays on the base plan with predictable bills.
How do I handle a price change mid-month?
Existing counters keep their unit (tokens). Update your pricing conversion in code. New events use the new rate. Communicate changes to users 30 days in advance per most jurisdictions' consumer protection rules.
Can I use AIPricingLab without Stripe meters?
Yes - AIPricingLab tracks usage independent of Stripe. You can pull totals at month-end, generate invoices manually, or push to any other billing system (Lago, Lemon Squeezy, Paddle, custom).
Other guides
How to track OpenAI API usage by user (with quotas, in real time)
Step-by-step guide to tracking OpenAI API usage per end-user with real-time quotas. Pure-TypeScript pattern using @vevee/sdk. Concurrency-safe, provider-agnostic, ten-minute integration.
7 min · GuideHow to implement per-user rate limits in your AI app
Per-user rate limits for AI apps need atomic enforcement, plan awareness, and refundable reservations. Here is the pattern that works under load - using @vevee/sdk and ~30 lines of code.
6 min · GuideThe reserve / commit / release pattern: atomic AI quota enforcement
Why naive AI usage metering breaks under concurrency, and the only correct pattern that fixes it. Reserve / commit / release explained, with full TypeScript example and the failure modes it prevents.
8 min · GuideHow to charge AI app users by token usage (with refunds and live balance)
Step-by-step: charge users for actual token consumption with pre-paid credits, post-paid invoicing, and a live balance display. Atomic reservations, accurate refunds, and Stripe meter sync.
10 min · GuideBuild a freemium image generator: end-to-end tutorial
Ship a freemium AI image generator with hard caps, upgrade prompts, real-time analytics, and a live "renders left" badge - using @vevee/sdk + your image provider of choice. Full code, ten-minute build.