Guide · 9 min read

How to implement usage-based pricing for an AI product

Decide between per-token, per-action, and hybrid pricing. Implement quotas, refunds, and Stripe meter sync. Step-by-step with code, from a developer who has shipped all three models.

Last updated: 2026-05-10

Usage-based pricing for AI is one of the trickiest parts of monetizing an AI product. Get it wrong and you either bleed margin on power users or scare off casual ones with bills they cannot predict.

This guide covers the three viable pricing models, when each fits, and how to implement them with Vevee + your billing system. We assume you are using Stripe, but the patterns generalize.

Step-by-step

1. Pick a pricing model: per-action, per-token, or hybrid

Per-action ("$X per render") is simplest to communicate; per-token ("$Y per 1M tokens") is fairest; hybrid ("$X/mo includes 100 renders, then $Y per overage") is what most mature AI products land on.

2. Define limit groups for the billable units

For per-action: unit "count". For per-token: unit "tokens" or "cents". For hybrid: stack a fixed quota group with an overage group on the same plan.

3. Reserve, call AI, commit, refund

Same atomic pattern as anywhere. The novelty for usage-based is the refund - when you reserved 4k tokens but only used 800, you must track a refund event so the user is not over-charged.

const r = await vevee.reserve(userId, "llm.tokens", 4000, { model: "gpt-4o" });
if (!r.allowed) throw new LimitError();

try {
  const res = await openai.chat.completions.create(/* ... */);
  const used = (res.usage?.prompt_tokens ?? 0) + (res.usage?.completion_tokens ?? 0);
  await vevee.commit(r.reservationId!);
  if (used < 4000) {
    await vevee.track(userId, "llm.tokens.refund", 4000 - used, {
      reservationId: r.reservationId!,
    });
  }
} catch (err) {
  await vevee.release(r.reservationId!);
  throw err;
}

4. Sync usage to Stripe at period close

Vevee tracks; Stripe invoices. At the end of each billing period (or in real time, depending on your Stripe plan setup), push the user's consumed quantity to a Stripe usage record on the relevant meter.

import Stripe from "stripe";
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);

export async function syncUsageToStripe(userId: string, stripeCustomerId: string) {
  const usage = await vevee.usage(userId);
  const tokens = usage.counters.find(c => c.label === "Billable tokens")!;

  await stripe.billing.meterEvents.create({
    event_name: "llm_tokens",
    payload: {
      stripe_customer_id: stripeCustomerId,
      value: String(tokens.count),
    },
    timestamp: Math.floor(Date.now() / 1000),
  });
}

5. Surface live spend to the user

Use a pk_live_ key in the browser to render "$3.42 used this month, $46.58 remaining on your $50 plan". Pulls from vevee.usage(userId) - safe to expose.

Margin math (the part nobody talks about)

If GPT-4o input costs $5/1M tokens to you, do NOT charge users $5/1M tokens. You need margin for: GPU latency overhead, rate-limit headroom, support cost, your gross margin target, and surprise model price changes. A 50-100% markup is normal for AI.

Pre-paid credits vs post-paid invoicing

Pre-paid: user buys $50 of credits, you decrement. Predictable revenue, no chargebacks, but blocks happy users mid-task. Post-paid: user uses, you bill. Better UX, but you carry credit risk and surprise bills are common. Most consumer AI is pre-paid; most B2B AI is post-paid; many products offer both.

Avoiding the "free tier was a mistake" trap

Free tiers for AI must have hard caps that break exactly when you intend them to. If your free tier covers 1k tokens but you allow up to 4k due to imprecise enforcement, your margin gets murdered. Vevee's atomic reserve closes this.

Communicating price to the user

Variable bills make users anxious. Always show "this prompt will cost approximately X" before the call runs (use a local tokenizer for the estimate). Show "$X used this month, $Y remaining" prominently. The biggest predictor of churn for usage-based AI products is users feeling out of control.

Frequently asked questions

Should I expose token counts to my users?

Probably not. "Tokens" is a confusing internal unit. Convert to dollars or "messages" or "renders" before showing. Keep tokens in the backend for cost accounting.

How do I price for users who use 100x the average?

Hybrid: include a generous fixed quota in the plan, then meter overage. The 100x users self-select into a higher plan or pay overage; everyone else stays on the base plan with predictable bills.

How do I handle a price change mid-month?

Existing counters keep their unit (tokens). Update your pricing conversion in code. New events use the new rate. Communicate changes to users 30 days in advance per most jurisdictions' consumer protection rules.

Can I use Vevee without Stripe meters?

Yes - Vevee tracks usage independent of Stripe. You can pull totals at month-end, generate invoices manually, or push to any other billing system (Lago, Lemon Squeezy, Paddle, custom).

Other guides

8 min · Guide