← Writing

Who Pays for the Tokens? Metering & Billing AI Agents in Laravel

There are a hundred blog posts about wiring an LLM into your app. There are almost none about the question that shows up the week after you ship it to a paying customer: who pays for the tokens, and how do you know?

This is the unglamorous half of production AI, and it's where I spend a lot of my time as a fractional CTO. If you want the upstream part — how I actually wire Claude into a Laravel app with the Laravel AI SDK and MCP — start there. This piece is about what happens to the bill.

The problem: a token is a variable cost wearing a fixed-price costume

When you sell a SaaS feature, your instinct is a flat monthly price. But every agent call has a real, variable cost, and that cost scales with how chatty each customer is — how long their documents are, how many tool round-trips the model takes, how often they hit the feature. Two customers on the same plan can differ 20x in token spend.

If you don't attribute that cost per customer, you find out at the end of the month, in aggregate, when your provider bill is bigger than you modelled and you have no idea which customer caused it. The fix is to make every agent invocation know who it's for — at the moment it runs.

Step 1: agents declare their own model and limits

In the Laravel AI SDK, an agent is a class. Its provider, model, and guardrails are attributes on the class, not parameters you thread through every call site:

#[Provider('anthropic')]
#[Model('claude-sonnet-4-6')]
#[MaxSteps(8)]
class CooAgent implements Agent, HasMiddleware, HasTools
{
    use Promptable;

    public function instructions(): Stringable|string
    {
        return <<<'PROMPT'
        You are the COO agent for a solo consulting business...
        PROMPT;
    }

    public function tools(): iterable
    {
        return [new ListProjects, new ListTrackedCustomers, new ResolveActionItem];
    }

    public function middleware(): array
    {
        return [new TrackAgentCalls(purpose: 'coo_agent.prompt')];
    }
}

The two lines that matter for billing are #[MaxSteps(8)] and that middleware() method. MaxSteps is a cost ceiling: a tool-calling agent that loops can rack up calls fast, and the cap is your circuit breaker. The middleware is where attribution lives.

Step 2: middleware captures cost for every call in one place

The mistake is sprinkling token counters through your controllers and jobs. Instead, attach one middleware to the agent and capture usage centrally:

public function middleware(): array
{
    return [new TrackAgentCalls(purpose: 'coo_agent.prompt')];
}

TrackAgentCalls records each invocation — purpose, model, prompt/completion tokens — into an agent_calls table. Now "what did AI cost us this week, broken down by feature?" is a SQL query, not a forensic exercise. This is the same move as putting billing in a Stripe webhook instead of in your checkout button: one chokepoint, not N.

Step 3: read usage off the response and report it to a meter

When you invoke the agent, the response carries the token counts:

$response = (new CooAgent)->prompt($prompt);

$response->usage->promptTokens;      // int
$response->usage->completionTokens;  // int
$response->meta->model;              // resolved model
$response->meta->provider;           // resolved provider

For customer billing I route the call through a Stripe LLM gateway and report the usage against that customer's Stripe identity. In a queued job, that looks like overriding the provider/model at the call site and then metering:

$result = ConversationTriageAgent::make(projectId: $conversation->project_id)->prompt(
    json_encode($payload),
    provider: 'stripe',          // route through the metering gateway
    model: $gatewayModel,
);

app(StripeTokenMeter::class)->report(
    $conversation->trackedCustomer?->stripe_customer_id,
    $result->usage->promptTokens,
    $result->usage->completionTokens,
    $gatewayModel,
);

Note the shape: the same agent class can run against Anthropic directly (the attribute default) for internal use, or be routed through provider: 'stripe' for customer-billed use. The agent doesn't change; the call site decides who pays. That separation is the whole trick.

Step 4: degrade gracefully — a failed agent must not 500 the page

A metered, networked, multi-step call is a thing that will fail sometimes. When the agent is enriching a response rather than being the response, swallow the failure and return without it:

try {
    $text = trim((string) ProjectSummaryAgent::make()->prompt(json_encode($slim)));
    return $text !== '' ? $text : null;
} catch (Throwable) {
    return null; // PDF still renders; it just skips the AI summary
}

The customer gets their document; they just don't get the optional AI paragraph. They never see a stack trace because a provider had a bad minute.

Step 5: trust the schema, but normalize the edge

Structured-output agents return typed data via a schema() method — booleans, enums, nested objects. In practice a gateway will occasionally hand you the JSON as a string instead of a decoded array, so I guard the boundary:

$items = $result['action_items'] ?? [];

if (is_string($items)) {                 // gateway returned JSON text
    $decoded = json_decode($items, true);
    $items = is_array($decoded) ? $decoded : [];
}

One defensive line at the seam beats a class of "why is this a string" bugs in production.

Why this is a Laravel advantage, not a tax

Here's the part people miss when they reach for a separate Python service for "the AI stuff": the moment you split it out, billing, auth, and your customer records live on the other side of a network boundary. Token metering means correlating two systems. Keeping the agents inside the Laravel app means the customer, their Stripe ID, the MaxSteps cap, the agent_calls ledger, and the meter report are all one transaction in one codebase. That's not a small convenience — it's the difference between knowing your per-customer AI margin and guessing it.

The honest summary

  • Meter internally per customer even if you bill a flat fee — it's how you find your margin and catch runaway agents.
  • Put attribution in middleware, once, not scattered through call sites.
  • Let the call site choose who pays (direct provider vs. metering gateway); keep the agent class stable.
  • Degrade gracefully and normalize the structured-output seam.

This is the kind of thing that's invisible in a demo and decisive in production — and it's roughly half of what an AI-native fractional CTO is actually for. If you're shipping AI features in Laravel and the billing/metering side is making you nervous, that's exactly the work I do.

Work with me See pricing