Q&A

How do I track LLM usage per user?

Call vevee.track(userId, "llm.tokens", tokenCount) after each LLM call. Vevee counts it against the user's plan limits in real time and exposes a per-user usage dashboard. For atomic enforcement under concurrency, use reserve / commit / release instead.

Last updated: 2026-05-10

The short answer

Wrap your OpenAI / Anthropic / Gemini call with vevee.reserve before, vevee.commit after success, vevee.release after failure. Vevee tracks every token (or request, or cent) per end-user per plan. No counter table on your side.

Minimal example

Replace the OpenAI call with this pattern. Reserve an upper bound of 4000 tokens, call OpenAI, commit, refund the unused difference. Two parallel calls cannot both pass the same quota.

const r = await vevee.reserve(userId, "llm.tokens", 4000, { model: "gpt-4o" });
if (!r.allowed) throw new Error("limit");
try {
  const res = await openai.chat.completions.create(/*...*/);
  const used = (res.usage?.prompt_tokens ?? 0) + (res.usage?.completion_tokens ?? 0);
  await vevee.commit(r.reservationId!);
  if (used < 4000) await vevee.track(userId, "llm.tokens.refund", 4000 - used);
  return res;
} catch (e) {
  await vevee.release(r.reservationId!);
  throw e;
}

Why not a Postgres counter?

You can build it. You will need: an atomic increment under concurrency, period rollover at the right anchor, refunds for failed calls, an admin UI, and webhooks at threshold. Vevee is the same primitive in one SDK call.

How do I track LLM usage per user?

The short answer

Minimal example

Why not a Postgres counter?

Related questions

How do I rate-limit OpenAI calls per user?

What is the reserve / commit / release pattern?

How do I charge AI app users by token usage?