Stripe metered billing for AI apps: what it solves and what it does not
You are sketching the pricing page for your AI app and every plan ends in "per generation" - because your costs arrive per generation. Stripe meters the charging half beautifully. The question is what happens at request time.
Last updated: 2026-06-09
Usage-based pricing is gravity for AI apps
You are sketching the pricing page for your AI image app, and every plan you write down ends the same way: per generation, per thousand tokens, per video second. That is not a trend you are following - it is your cost structure leaking into your pricing, because every request your users make arrives with a provider invoice attached. Usage-based pricing is the natural model for AI apps, and Stripe has first-class support for it. Billing Meters work exactly the way you would hope: you define a meter, you send a meter event every time a user consumes something, Stripe aggregates the events per billing period, and the invoice computes itself. The integration is one API call after each AI request.
// After the AI call succeeds, report usage to Stripe
await stripe.billing.meterEvents.create({
event_name: 'image_generations',
payload: {
stripe_customer_id: customer.id,
value: '4', // four images in this batch
},
});What Stripe metered billing genuinely solves
Be clear-eyed about how much this buys you, because it is a lot. Invoice math at period boundaries - which events fall into which invoice, across time zones and clock skew - is handled. Proration when a customer upgrades mid-cycle is handled. Tax across jurisdictions is handled. Failed payments, retries, and dunning emails are handled. Tiered and graduated per-unit pricing - first thousand generations at one price, everything after at another - is a configuration option, not a project. And every charge comes with an auditable record of the usage that produced it, which is what you will reach for when a customer disputes a bill. These are brutally hard problems, they are not your product, and Stripe solves them better than anything you would build in-house. If your model is purely "pay for what you used, billed monthly in arrears," you send meter events and you may genuinely be done.
Gap #1: metering is not enforcement
Stripe aggregates usage in order to charge for it. It has no answer to a different question: should THIS request be allowed to run right now? There is no canUse() in the Stripe SDK, no atomic check-and-increment, no way to make request five hundred and one fail because the plan says five hundred. For pure arrears billing that is by design - if every unit is billable, every request is allowed, and the only job is counting accurately. But most AI apps do not sell pure arrears. They sell capped plans: "100 generations a month" on Pro, 10 a day on Free, because consumers will not sign up for an open-ended bill and you will not survive an open-ended provider invoice. A cap needs a real-time, race-safe check at request time, before the provider spend happens - and once you build the counters table, the period rollover, and the per-user check that a cap requires, Stripe's meter has become a downstream copy of a system you own. Stripe's acquisition of Metronome in 2026 shows where they see usage-based billing going - enterprise-grade aggregation - which is still the charging half, not the request-time enforcement half.
Gap #2: Stripe knows customers, your limits know users
Look at the meter event payload again: it keys on stripe_customer_id. That is the correct key for billing, because the Customer is who gets charged. It is the wrong key for limits, because the entity you cap is your end user - identified by your user id, which Stripe has never heard of. The mismatch shows up immediately at the edges. Free-tier users have no card on file and usually no Customer object at all, so there is nothing to attach a meter event to. Anonymous trial users - the ones you most need to cap - have no Stripe identity by definition. And in B2B apps the Customer is the workspace, so "each seat gets 50 generations a month" has no natural expression in Stripe at all: the meter sees one customer, your plan sees twelve people. You can force it - create Customers eagerly, stuff your user id into metadata, aggregate client-side - but you are bending a billing identity model into an entitlement identity model, and every bend is code you maintain.
Gap #3: invoices aggregate, dashboards and free tiers do not
Meter events are aggregated for invoicing, and the aggregation pipeline is built for that job: it is allowed to lag, because the invoice is not computed until the period closes. That makes it the wrong source for anything live. "How many premium images did user X generate in the last hour, broken down by model?" is a query against an events table with metadata on every row - not against an invoice line that says image_generations: 1,407. The usage bar in your app, the admin view that catches the runaway user before the bill arrives, the per-model margin analysis - all of them want granular, queryable, near-real-time events, and the meter gives you period totals on billing's schedule. The free tier makes the point sharply: when nothing is charged, Stripe has nothing to meter - a free plan produces no invoice, so it produces no usage record. But the free tier is exactly where limits matter most, because it is the one plan where every marginal request is pure cost.
The architecture that actually works
The clean answer is two layers with different jobs, not one layer stretched over both. At request time, a metering and limits layer owns the counters, the plan definitions, the race-safe reserve, and the per-user analytics - it answers "is this allowed?" before any provider spend and records what actually happened. At billing time, Stripe owns the money: either it charges whatever the metering layer says was used, or it bills flat plans whose caps the metering layer enforces. The seam between them is thin and runs in both directions. On the request path: reserve quota, run the AI call, commit on success or release on failure. On the billing path: Stripe fires a webhook when a subscription renews or changes, and your handler tells the metering layer so quotas reset on the real billing date. Neither layer pretends to do the other's job, and neither contains a half-built copy of the other. Notice what this does to the identity problem from earlier: the metering layer keys on your user id everywhere, and the only place a Stripe Customer id appears is inside the webhook handler, mapped once. Free users live entirely in the metering layer and never touch Stripe; paying users exist in both, joined by that single lookup.
// Request time - the metering layer answers, race-safe
const r = await meter.reserve(userId, 'image_generation', 1);
if (!r.allowed) return jsonError(429, 'limit_reached');
const image = await generateImage(prompt);
await meter.commit(r.reservationId);
// Billing time - Stripe webhook keeps the quota cycle honest
if (event.type === 'customer.subscription.updated') {
const sub = event.data.object;
await meter.upsertSubscription({
userId: userIdFromCustomer(sub.customer),
planId: planIdFromPrice(sub.items.data[0].price.id),
cycleStart: new Date(sub.current_period_start * 1000).toISOString(),
});
}When Stripe alone is enough
None of this means you always need the second layer. There is a real class of product where Stripe Billing Meters cover the whole problem, and adding anything else is over-engineering - typically a B2B API or infrastructure product where every customer has a contract, a card, and a tolerance for variable invoices. The test is not what kind of company you are; it is what shape your pricing takes. Run the checklist honestly:
- Your pricing is pure arrears - users pay for exactly what they used, billed at period end, with no hard caps to enforce mid-cycle
- You have no free tier, so there is no unbilled usage to limit
- Every user of your product maps one-to-one to a Stripe Customer with a payment method on file
- Your buyers are B2B and tolerate the occasional surprising invoice instead of a mid-month cutoff
- You do not need to show users a live "X remaining this period" number inside the product
Two layers, two tools
The honest summary: Stripe is the right tool for the charging half, and most AI apps also need an enforcement half that Stripe was never designed to provide - per-end-user caps, free-tier limits, race-safe checks at request time, granular usage you can query and show. Vevee is that second layer: per-end-user counters, plans with limit groups, and an atomic reserve/commit/release flow, built to sit next to Stripe (or Polar, or any biller) rather than replace it - the webhook pattern above is the whole integration, and the free tier is enough to wire it up before your next pricing meeting.
More from the blog
The race condition in "if (usage < limit)" that is costing your AI app money
A user at nine of ten images opens six tabs and clicks generate in all of them. Every tab passes your limit check, every tab gets an image, and you pay for all six. The bug is one read-then-write - and your unit tests will never catch it.
engineering · 9 minHow to build a credits system for your AI app (ledger design, rollover, refunds)
A user emails: "I had 40 credits this morning and now I have 12, and I only made one image." If your balance is a column you mutated, you cannot answer that email. Here is the ledger design that can.
thinking · 8 minHelicone is in maintenance mode. An honest map of where to go
Somewhere in your codebase a base URL points at Helicone, and it has been quietly doing four jobs at once. The team got acquired - good for them - but now you have to replace each job separately, and the one everyone forgets is the one your users will exploit.
engineering · 8 minHow to stop free-tier abuse without killing signups
The usage graph spikes at 3:41am: four thousand free generations in an hour, all from disposable emails. Your first instinct is a credit-card wall. That instinct will cost you more than the abuse does.
thinking · 7 minTokens, credits, or requests: choosing the unit you meter (and price)
The pricing doc has three columns: $9, $19, $49. The prices took twenty minutes. The row above them - what a user actually gets for the money - has been blank for a week. That blank row is the real decision.
engineering · 3 minI rewrote my cancel flow with one LLM call. It argues better than I do.
The cancel flow is where SaaS revenue goes to die politely. "Are you sure? You'll lose access to Premium features" has never changed a single mind - but reminding a user of their own 312 generations this month is an argument.
engineering · 3 minThe trial-ending email everyone sends is the same email. Here's the one that converts
"Your trial ends in 3 days! Upgrade now to keep access." You've received a hundred of these. You've deleted a hundred of these. The email fails because it's about your product, not about the user's trial.
engineering · 3 minStop making users do math on your pricing page: recommend their plan from their own usage
Every pricing page asks the user to solve an estimation problem: "How many credits will I need per month?" Nobody knows. But for any user who has actually used your product, their usage history IS the answer - here is how to put it on the pricing page.
engineering · 3 minWin-back emails fail because they're written for "users." Write them for the one user instead.
Every dormant-user campaign in history: "We miss you! Here's what's new." Open rate: pity clicks. The email fails because it's about your changelog - and the user left because of something in their experience.
engineering · 3 minSpotify Wrapped is a growth loop, not a year-end gimmick. Ship one for your AI app in an afternoon.
Spotify Wrapped works because people love seeing their own behavior reflected back as a story. Every AI app with usage data can run that loop monthly - and almost none do, because turning usage rows into narrative used to be a content problem. It is now a schema problem.
engineering · 3 min"AI-personalized copy converts better" - prove it or delete it. Here is the 40-line A/B harness
Half of the "we added AI personalization and conversions went up 40%" posts have no control group. The other half measured clicks, not revenue. If LLM-generated copy is going in front of your paywall, you owe yourself a real experiment - and the harness is tiny.
thinking · 3 minAI personalization without the creepy part: opt-out as a first-class return value
Users have learned that "personalized for you" means "we mined everything you ever did" - and the backlash is rational. When I added LLM-generated personalized copy to my app, the part I sweated was not the generation. It was making declining it a real, respected choice.
engineering · 3 minI stopped walking into demo calls blind: every lead now comes with a usage brief written 10 seconds before the call
Founder-led sales has one structural weakness: you have no time to prep. If the lead has touched your product, their usage history is the best discovery call you'll never have to run - here's how I turn it into a one-page brief, automatically, before every call.
thinking · 4 minThe "founder email" converts like crazy and scales like garbage. Here is the middle path.
A personal email from the founder converts trials at a rate no automated sequence touches - and at 50 signups a week it stops scaling. The middle path: drafts generated from each user's real usage, that you read, edit, and send yourself.
engineering · 3 minChurn doesn't announce itself. My Monday Slack digest does it instead.
Every founder finds out about churn the same way: the cancellation email. By then the user has been gone for weeks - the decision happened earlier, quietly, in their usage. So I made the metering tables write me a memo.
engineering · 3 minHalf of every support ticket is asking what the user already did. Attach the answer instead.
The ticket says "it's not working" - and the next twenty minutes go to figuring out who this user is, what plan they're on, and whether "it" is a bug, a quota, or a misunderstanding. The actual fix usually takes two. All of that context lives in your usage data; here is how to attach it to every ticket automatically.
thinking · 4 minYour users are telling you your roadmap, in writing, every day. It's in your prompt logs
Founders pay for interviews and beg for survey responses to learn what people are actually trying to do. Meanwhile, your users type their intent into your product, in their own words, hundreds of times a day - and nobody performs for a prompt box.
engineering · 5 minHow to reset usage limits when a subscription renews
A weekly plan, used twice. First week: ten generations. Second week: zero, because the counter never reset. This is the cron-job mistake - and the fix is one field on one call.
engineering · 6 minHow to manage subscription renewals: aligning Vevee with Stripe
A user signs up on Jan 15. Stripe charges them on the 15th of every month. Your metering layer resets on the 1st. Two clocks. One angry support ticket per cycle.
engineering · 5 minStop hardcoding your pricing page - render it from your metering layer
Every B2B SaaS I have shipped repeats the same mistake: plans live in two places - the dashboard that enforces them, and a const PLANS = [...] on the marketing site. They drift within a quarter.
thinking · 4 minMeter AI by user, not by account - your margin depends on it
A few users will cost you 100x what your median user costs. If you only meter at the account level, you will not see them coming until your gross margin is gone.
engineering · 5 minreserve / commit / release: the only correct way to enforce AI quotas
Every team I have seen build per-user AI metering has shipped a version of canUse → call OpenAI → track. It looks correct in single-threaded tests. It is broken in production.
thinking · 4 minWhy Stripe Billing is not enough for AI products
Stripe is excellent at one thing: turning usage into invoices. AI products need three other things, and Stripe does not do any of them.
engineering · 6 minDynamic onboarding: a different first step for every user
A teacher and a student sign up the same minute. The teacher wants to build a quiz; the student wants to summarize a lecture. Your onboarding shows them both the same five-step checklist. One of them bounces.
thinking · 6 minPaywall copy that rewrites itself for every user
Your paywall says "Upgrade to Pro for unlimited generations." A teacher reads it and shrugs. A student on a budget reads it and closes the tab. The same words, two lost conversions - because the words were written for nobody in particular.
engineering · 5 minAdd usage limits to your AI app in 10 minutes (no backend required)
You shipped an AI feature on Friday. By Monday one user had burned $212 of OpenAI credit on your free tier. The fix is not a TODO comment that says "add rate limiting" - it is two method calls.
engineering · 5 minMeter LLM tokens, not requests - your flat per-request limit is lying
Two users, both at 100 requests. One sent tweets, the other sent novels. Your cost for them differed by 400x. Your limit treated them identically - and your margins noticed.
engineering · 7 minThe upgrade nudge that writes itself: convert free users before they hit the wall
By the time a user hits your paywall, they are blocked, annoyed, and halfway to a competitor’s signup page. The best moment to make the pitch was three days earlier - when they were winning. Here is how to catch it, automatically, for every user at once.
engineering · 5 minOne event, two limits: gate your premium model without forking your code
The premium model launch was going great until you looked at the bill: free users had figured out the good model and were living on it. You need a sub-limit. You do not need a second code path.
engineering · 5 minTest mode: break your pricing in the sandbox, not on your customers
You changed the free tier from 10 to 25 generations and somehow locked out every Pro user for an hour. Nobody tested it, because testing it meant tracking fake events into production analytics. There is a mode for this.
engineering · 5 minThe support ticket that solves itself: log the prompt behind every AI event
A user says your AI feature "broke" on Tuesday. You have a charge for the call, a timestamp, and no idea what they asked or what the model said. The evidence existed for exactly one request - the one you didn’t log.
engineering · 5 minYour funnel has one broken step. Find it without writing a single SQL query.
A hundred people saw your paywall this week. Three upgraded. Is that a copy problem, a price problem, or did ninety of them never generate anything worth paying for? You cannot fix what you cannot locate.
engineering · 5 minOne user, three ghosts: fix your funnel with identify()
Your funnel says signup conversion is 4%. It is actually 11%. The missing users didn’t bounce - they came back on another tab and got counted as someone new. Every number downstream of that split is wrong.
thinking · 5 minThe usage bar that sells the upgrade (build it with a public key in an afternoon)
An invisible limit feels like a trap. A visible one feels like a fuel gauge. Same quota, same plan, same user - and a measurably different reaction when the wall finally arrives.
engineering · 6 minHow to cancel a subscription without burning the bridge (or your data)
The user cancels. Do they drop to a free tier, or lose access when the paid month runs out? Those are different products, different SDK calls, and different mistakes when you get them wrong.