Freemium pricing for an AI image generator

A real-world recipe: a lifetime free tier capped at 2 images at a single resolution, plus a monthly paid plan with 40 generations split across two models (20 + 20). All of it expressed with limit groups, periods, and the reserve / commit pattern - no extra infrastructure.

The pricing we’re modelling

Plan	Reset	What the user gets
Free	Never (lifetime)	2 images, Model A, 1k resolution
Pro	Monthly, anchored to the purchase date	20 images on Model A (1k) + 20 images on Model B (2k)

Mental model: plans are metering rules, not feature flags

Before any code, internalise this: a limit group says “count these events and stop them at this quota”. It does not say “these are the only models the user can call”. Anything that doesn’t match a group sails through unmetered. We’ll come back to this when we wire up the free plan.

1. Define the free plan (lifetime, single bucket)

One limit group, quota 2, period lifetime. The match rule pins it to Model A at 1k via the reserved variant metadata key.

{
  "periodType": "lifetime",
  "periodAnchor": "subscription_start",
  "groups": [
    {
      "id": "lg_free_1k",
      "label": "Free 1k images",
      "unit": "count",
      "quota": 2,
      "matches": [
        { "event": "image.model_a", "metadata": { "variant": "1k" } }
      ]
    }
  ]
}

With lifetime, the period has end = null - the counter never resets. Two images, forever. See core concepts → periods for how this is computed.

Closing the “unmetered model” loophole

If your app ever calls image.model_bfor a free user - accidentally, or because someone hits your API directly - the free plan above has no group matching it. The SDK’s canUse() returns allowed: true with no matched groups, and your AI provider gets called for free.

Two layers of defence, in order of importance:

App-side:don’t expose Model B in the UI for free users, and reject Model B requests on the server before you ever call the SDK. This is where capability gating belongs.
Metering-side (defence in depth): add a quota: 0 group on the free plan that matches Model B. Now even if a request slips through, the SDK denies it.

{
  "id": "lg_free_deny_b",
  "label": "Model B not available on free",
  "unit": "count",
  "quota": 0,
  "matches": [{ "event": "image.model_b" }]
}

2. Define the paid plan (monthly, two buckets)

Two independent limit groups in the same plan. They don’t share a pool - they happen to add up to 40 because each is 20. Anchor the period to subscription_startso the cycle resets on the user’s purchase day, not on the calendar 1st.

{
  "periodType": "monthly",
  "periodAnchor": "subscription_start",
  "groups": [
    {
      "id": "lg_pro_model_a",
      "label": "Model A (1k) - 20/mo",
      "unit": "count",
      "quota": 20,
      "matches": [
        { "event": "image.model_a", "metadata": { "variant": "1k" } }
      ]
    },
    {
      "id": "lg_pro_model_b",
      "label": "Model B (2k) - 20/mo",
      "unit": "count",
      "quota": 20,
      "matches": [
        { "event": "image.model_b", "metadata": { "variant": "2k" } }
      ]
    }
  ]
}

What about “first 20 of A, then 20 of B”?

If you read the spec as “the user must finish all 20 of Model A before any Model B is allowed,” the metering layer can’t express that - match rules don’t depend on the state of other counters. You’d enforce ordering in your app code: check Model A’s remaining count via usage(), fall through to Model B only when it’s zero.

In practice almost everyone wants the simpler reading - two independent buckets, mix freely - so we recommend going with that and only adding ordering if a real product reason demands it.

3. Wire up the SDK

Install & create a singleton

// lib/vevee.ts
import { createClient } from '@vevee/sdk';

export const vevee = createClient({
  apiKey: process.env.VEVEE_SECRET_KEY!,
});

Assign the free plan on signup

// after creating the new end-user in your app
await vevee.upsertSubscription({
  userId: newUser.id,
  planId: 'plan_free',
});

Upgrade the user when they pay

// in your Stripe webhook (or wherever you confirm payment)
await vevee.upsertSubscription({
  userId: customer.id,
  planId: 'plan_pro',
  startedAt: new Date().toISOString(), // anchors the monthly cycle from now
});

Generate an image - race-safe

Use reserve() → commit() / release()so two concurrent requests can’t both squeak past a quota of 1.

import { vevee } from '@/lib/vevee';
import { VeveeError } from '@vevee/sdk';

type ModelId = 'model_a' | 'model_b';

export async function generateImage(userId: string, model: ModelId, prompt: string) {
  const event = model === 'model_a' ? 'image.model_a' : 'image.model_b';
  const variant = model === 'model_a' ? '1k' : '2k';

  const r = await vevee.reserve(userId, event, 1, { variant });
  if (!r.allowed) {
    throw new VeveeError({
      code: 'limit_reached',
      message: r.reasons.join(', '),
      status: 429,
    });
  }

  try {
    const image = await callTheActualProvider(model, prompt);
    await vevee.commit(r.reservationId!);
    return image;
  } catch (err) {
    await vevee.release(r.reservationId!).catch(() => {});
    throw err;
  }
}

Show remaining quota in the UI

'use client';
import useSWR from 'swr';

export function QuotaBadge({ userId }: { userId: string }) {
  const { data } = useSWR(`/api/quota?userId=${userId}`, (u) =>
    fetch(u).then((r) => r.json()),
  );
  if (!data) return null;

  // server-side handler calls vevee.usage(userId) and forwards the JSON
  return (
    <ul className="quota">
      {data.counters.map((c: any) => (
        <li key={c.groupId}>
          {c.label}: {c.count}/{c.quota} ({c.remaining} left)
          {data.period?.end && (
            <> · resets {new Date(data.period.end).toLocaleDateString()}</>
          )}
        </li>
      ))}
    </ul>
  );
}

4. Behaviour cheat-sheet

Scenario	Free user	Pro user
Calls Model A (1k), 1st time	Allowed · counter 1/2	Allowed · counter 1/20
Calls Model A (1k), 3rd time	Denied · `limit_reached`	Allowed · counter 3/20
Calls Model B (2k)	Allowed if no deny group · denied if you added one	Allowed · counter 1/20 in the Model B bucket
21 days after starting	Free counter unchanged (lifetime)	Counters unchanged - period rolls over only at 30 days from `startedAt`

5. Common variations

Add a third tier (e.g. Studio)

Same shape, larger quotas. If Studio gets unlimited Model A but capped Model B, just leave out the Model A group entirely on that plan - events that don’t match anything go through unmetered.

Charge in cents instead of count

Switch the group’s unit from count to cents and pass the marginal cost as quantity on each track() call. Useful if your underlying provider charges per-image and you want to enforce a dollar cap.

Single shared pool of 40, mix freely

Replace the two paid groups with one group whose match rule covers both events:

{
  "id": "lg_pro_any",
  "label": "Any image - 40/mo",
  "unit": "count",
  "quota": 40,
  "matches": [
    { "event": "image.model_a" },
    { "event": "image.model_b" }
  ]
}

6. Use case: fall back from Model A to Model B when A is exhausted

A common product behaviour: the app prefers Model A (cheaper, faster, default), and only switches to Model B once the user has used up their Model A quota for the period. This is not something the metering layer routes for you - canUse()answers “is this specific event allowed right now?”, not “which model should I call?”. The routing decision lives in your app; Vevee is the oracle that tells you whether each option is currently allowed.

The pattern

Try to reserve() against Model A.
If denied with limit_reached, try Model B.
If both deny, surface the limit error to the user.
On the successful attempt, call your provider, then commit() (or release() on failure).

The two paid limit groups from section 2 already give you everything you need - one bucket per model, each with its own quota. No schema changes.

Code

import { vevee } from '@/lib/vevee';
import { VeveeError } from '@vevee/sdk';

type ModelId = 'model_a' | 'model_b';

const PREFERENCE: { id: ModelId; event: string; variant: '1k' | '2k' }[] = [
  { id: 'model_a', event: 'image.model_a', variant: '1k' },
  { id: 'model_b', event: 'image.model_b', variant: '2k' },
];

export async function generateImageWithFallback(userId: string, prompt: string) {
  for (const m of PREFERENCE) {
    const r = await vevee.reserve(userId, m.event, 1, { variant: m.variant });
    if (!r.allowed) continue; // this model's quota is exhausted - try the next one

    try {
      const image = await callTheActualProvider(m.id, prompt);
      await vevee.commit(r.reservationId!);
      return { image, modelUsed: m.id };
    } catch (err) {
      await vevee.release(r.reservationId!).catch(() => {});
      throw err;
    }
  }

  throw new VeveeError({
    code: 'limit_reached',
    message: 'All models exhausted for this billing period',
    status: 429,
  });
}

Why `reserve()` and not `canUse()`

You could read usage() to see how much of Model A is left, decide which model to call, and then track(). With concurrent requests, two callers can both read “1 left on Model A,” both pass the check, and both bill it. Using reserve()in the loop makes the “is there room?” check and the increment a single atomic step - the second caller’s reserve on Model A returns allowed: false and falls through to Model B cleanly.

Optional: show the user which model they’re about to use

If your UI needs to label the next generation (“Next image: Model B”), read usage() for display only - never to gate the call. Treat it as a hint that may be slightly stale; the authoritative decision still happens in reserve().

const u = await vevee.usage(userId);
const a = u.counters.find((c) => c.groupId === 'lg_pro_model_a');
const nextModel = a && a.remaining > 0 ? 'model_a' : 'model_b';

Where to go next

Core concepts - the full identity, period, and matching model
reserve() / commit() / release() - race-safe metering
Recipes - short copy-paste patterns for streaming LLMs, multi-provider fallback, Stripe webhooks