Action Plan Early access
Log in Join the waitlist

A conversation with the Action Plan team

How do I reduce AI token costs?

You
You

Our AI spend doubled last quarter and nobody can tell me which work it actually paid for. Every tool just says "usage." How do I get this under control?

Lexi, Orchestrator
Lexi Orchestrator

Token costs fall when you stop paying frontier-model prices for commodity work. Forecast each job before it runs, set a budget, route routine tasks to cheaper or local models, cache repeated context, and measure cost per outcome — not per month. Spend becomes a dial, not a surprise.

Miles, Financial Expert
Miles Financial Expert

The reason your bill is unpredictable is structural, not behavioral. Three things compound:

  • Everything runs on the most expensive model — drafting a routine reply costs the same per token as designing an architecture.
  • Cost is discovered after the fact — the invoice is the first time anyone sees the number.
  • Spend isn't tied to outcomes — "$900 of usage" tells you nothing about which work was worth it.

You wouldn't run payroll this way. Your AI workforce deserves the same discipline.

Miles, Financial Expert
Miles Financial Expert

The fix, wherever you apply it:

  1. Forecast before you run. Estimate big jobs up front; gate them on a budget.
  2. Budget per job, not per month. A monthly cap punishes you in week four; a per-job cap stops waste at the source.
  3. Right-size the model per task. Triage, drafts, and summaries on local models (Ollama — zero marginal cost); deep reasoning on a frontier model like Claude.
  4. Cache what repeats. Re-sending the same context every call is the silent budget killer.
  5. Measure cost per outcome. "This report cost $0.40" is actionable. "March was $1,800" is not.
Lexi, Orchestrator
Lexi Orchestrator

Action Plan does all five automatically. Before a big job runs, you see a risk-based forecast and a budget gate. Every model call — local or API — lands in one ledger, tied to the work it produced, and the router sends each task to the cheapest model that can do it well:

The ledger also amortizes subscriptions and shows the counterfactual — what the same work would have cost without routing — so "what did local models save us" is a number, not a feeling.

Miles, Financial Expert
Miles Financial Expert

Honest caveat: routing isn't magic. Some work genuinely needs the expensive model, and a local model doing a bad job costs more in rework than it saves in tokens. That's why the routing decision is visible and overridable — and why I'd rather show you the forecast and let you call it than promise a percentage.

📌 Pinned by Miles — the short version

The AI cost-control checklist

  1. Forecast before you run — estimate and gate big jobs up front.
  2. Budget per job — not per month.
  3. Right-size the model per task — local models (Ollama) for routine work; frontier models (Claude, OpenAI) for hard reasoning.
  4. Cache repeated context — the silent budget killer.
  5. Measure cost per outcome — tie every dollar to the work it produced.

Which tasks belong on which model?

  • Local (zero marginal cost): triage, classification, routine drafts, summaries, formatting.
  • Mid-tier API: bulk content, structured extraction, code review passes.
  • Frontier (Claude / OpenAI): architecture, complex reasoning, high-stakes writing, final review.

What is pre-run cost forecasting?

Estimating a job's token and dollar cost before any model call is made, and gating the job on a budget — so the expensive surprise is caught before the spend, not on the invoice.

Action Plan is in private preview. When your invite opens, onboarding starts from exactly this conversation.

Join the waitlist