How to charge AI app users by token usage (with refunds and live balance)
Step-by-step: charge users for actual token consumption with pre-paid credits, post-paid invoicing, and a live balance display. Atomic reservations, accurate refunds, and Stripe meter sync.
Last updated: 2026-05-10
Charging by tokens means you actually need to count tokens correctly, hold the right amount of quota during a streaming call, and refund what was not used. Get any of those wrong and you either overcharge users (churn) or undercharge yourself (margin).
This is the implementation pattern that works.
Step-by-step
1. Pick a unit: tokens, cents, or both
For pure token billing, use unit "tokens". For dollar-cost billing (where different models cost different amounts), use unit "cents" and convert at track time. Both work; cents is more flexible if you support multiple models with different costs.
2. Define a limit group with the right unit
In the dashboard, create a limit group on the user's plan. Unit: tokens or cents. Quota: how much they get this period. Period: usually monthly. Anchor: subscription_start (cleaner) or calendar (simpler).
3. Reserve an upper bound before the call
You do not know exact token count until the response. Reserve a safe upper bound - for chat, prompt token count + max_tokens.
import { encoding_for_model } from "tiktoken";
const enc = encoding_for_model("gpt-4o");
const promptTokens = enc.encode(JSON.stringify(messages)).length;
const maxOutTokens = 1500;
const upperBound = promptTokens + maxOutTokens;
const r = await vevee.reserve(userId, "llm.tokens", upperBound, { model: "gpt-4o" });
if (!r.allowed) throw new LimitError();4. Call the AI provider, then commit and refund
After the response, you know the actual token count. Commit the reservation, then refund the difference between upper-bound and actual.
try {
const res = await openai.chat.completions.create({
model: "gpt-4o",
messages,
max_tokens: maxOutTokens,
});
const actual =
(res.usage?.prompt_tokens ?? 0) +
(res.usage?.completion_tokens ?? 0);
await vevee.commit(r.reservationId!);
if (actual < upperBound) {
await vevee.track(userId, "llm.tokens.refund", upperBound - actual, {
reservationId: r.reservationId!,
});
}
return res;
} catch (err) {
await vevee.release(r.reservationId!);
throw err;
}5. Show the live balance to the user
Use a pk_live_ public key in the browser. vevee.usage(userId) returns the user's counters with remaining quota.
// In a React component
const usage = await vevee.usage(userId);
const tokens = usage.counters.find(c => c.label === "Tokens")!;
return <div>{tokens.remaining.toLocaleString()} tokens left this month</div>;6. For pre-paid credits: bump the quota on purchase
When the user buys 100k more tokens, bump their custom limit. AIPricingLab counters keep ticking.
await vevee.upsertSubscription({
userId,
planId: "plan_paygo",
customLimits: {
tokens: { quota: currentQuota + addedTokens },
},
});Streaming chat: same pattern
For streaming responses, reserve up front, stream the result, count tokens at the end (most providers send a final usage chunk), then commit + refund. The reservation holds quota for the entire stream so a parallel request cannot race past.
Multiple model tiers
GPT-4o-mini at $0.15/M input is much cheaper than GPT-4o at $2.50/M. If you charge users $/token, you should adjust the cost-per-token by model. Easiest pattern: track in cents (not tokens) and compute cents-per-call at track time using a model price table.
Stripe meter sync
For post-paid billing, push the user's monthly token total to a Stripe meter at period close. AIPricingLab tracks; Stripe bills. The /guides/usage-based-pricing-ai guide covers this in detail.
Refund accuracy matters
Users notice when their balance does not match what they actually consumed. Always issue refund events for unused reservations - the absolute worst feedback is "I sent one short prompt and you charged me 4000 tokens." Auto-refunding the unused portion makes balances match user intuition.
Frequently asked questions
Should I count input + output tokens or just output?
Count both. Input tokens are cheaper but they are still your cost. Most providers (OpenAI, Anthropic) bill you for both, so you should bill users for both.
How do I handle different prompt-token vs completion-token prices?
Track in cents (not tokens) and compute cost at track time. Or stack two limit groups - "input tokens" and "output tokens" - each with its own conversion. Either works; cents is simpler.
What if my token estimate (upper bound) is wrong?
If actual < upper bound: refund the difference. If actual > upper bound: it cannot happen if max_tokens is set correctly, but if it somehow does, you can track an additional event for the overage.
How do I let users buy credit packs?
Have Stripe create a one-time payment intent for the pack ($10 = 200k tokens). On checkout success, bump their custom limit by 200k. Done.
Other guides
How to track OpenAI API usage by user (with quotas, in real time)
Step-by-step guide to tracking OpenAI API usage per end-user with real-time quotas. Pure-TypeScript pattern using @vevee/sdk. Concurrency-safe, provider-agnostic, ten-minute integration.
7 min · GuideHow to implement per-user rate limits in your AI app
Per-user rate limits for AI apps need atomic enforcement, plan awareness, and refundable reservations. Here is the pattern that works under load - using @vevee/sdk and ~30 lines of code.
9 min · GuideHow to implement usage-based pricing for an AI product
Decide between per-token, per-action, and hybrid pricing. Implement quotas, refunds, and Stripe meter sync. Step-by-step with code, from a developer who has shipped all three models.
6 min · GuideThe reserve / commit / release pattern: atomic AI quota enforcement
Why naive AI usage metering breaks under concurrency, and the only correct pattern that fixes it. Reserve / commit / release explained, with full TypeScript example and the failure modes it prevents.
10 min · GuideBuild a freemium image generator: end-to-end tutorial
Ship a freemium AI image generator with hard caps, upgrade prompts, real-time analytics, and a live "renders left" badge - using @vevee/sdk + your image provider of choice. Full code, ten-minute build.