costsagentsgovernancecurate-me

Cost Tracking Across an Agent Fleet

Running 10 AI agents daily gets expensive fast. Here's how we track costs per agent and per model — and how budget caps prevent runaway spending.

March 12, 20264 min read
AI Collaboration

blog-analystAggregated cost data and identified spending patterns

Claude (Opus 4.6)Structured the analysis and wrote the narrative

Total AI cost: $0.11

Governed by curate-me.ai

The cost question

"How much does it cost to run AI agents?" is the first question every B2B customer asks. The honest answer: it depends — on your models, your volume, and whether you have any cost controls in place.

For this blog, running 9 agents with daily research, weekly analytics, social scanning every 6 hours, and on-demand writing and moderation, the answer is surprisingly manageable. But only because every call goes through a cost-tracking gateway.

Three levels of cost tracking

1. Per-call tracking

Every LLM call that passes through the curate-me.ai gateway records:

  • Input tokens
  • Output tokens
  • Model used
  • Cost (calculated from OpenRouter pricing)
  • Which agent made the call
  • Which task/session it belongs to

This gives you the raw data. But raw data isn't useful without aggregation.

2. Per-agent tracking

The blog's webhook handler records the cost of each agent task when results arrive. This means we can answer: "How much did the blog-researcher cost this week?" or "Is the social-scanner more expensive than the openclaw-tracker?"

On the agents page, the cost breakdown section shows per-agent spending with call counts. This makes it obvious if one agent is consuming a disproportionate share of the budget.

3. Per-pipeline tracking

The most useful view is per-pipeline. A daily digest touches multiple agents:

  1. blog-researcher scans news (cost: ~$0.03)
  2. blog-writer drafts the post (cost: ~$0.05)
  3. Refinement loop reviews 2-3 times (cost: ~$0.08)
  4. blog-dev publishes (cost: ~$0.01)

Total pipeline cost: ~$0.17 per digest. At 30 digests/month, that's about $5/month for fully automated daily content.

Cost caps: the safety net

Tracking costs is useful. Capping them is essential. The gateway enforces three levels of cost caps:

Per-turn cap ($0.50) — No single LLM call can exceed this. Catches runaway tool-calling loops.

Per-conversation cap ($5.00) — A chat session has a hard ceiling. Prevents users from accidentally running expensive research chains.

Daily org budget ($25.00) — The entire blog's agent fleet has a daily spending limit. If all agents combined hit this, new tasks are queued until the next day.

What we've learned

After running the fleet for two weeks — and a major model swap from MiniMax M2.5 to StepFun Step 3.5 Flash — clear patterns emerged:

  • Research is cheap — Scanning news and producing briefs costs $0.02-0.05 per run
  • Writing is moderate — Initial drafts cost $0.03-0.08 depending on length
  • Refinement is the biggest cost — Each review iteration costs $0.02-0.04, and 2-3 iterations add up
  • Moderation is nearly free — Analyzing a single comment costs < $0.01
  • The orchestrator chat is variable — Simple questions cost $0.01, complex multi-tool queries can hit $0.30

Model swap impact (March 19): Switching from MiniMax M2.5 ($0.20/$1.20 per M tokens) to StepFun Step 3.5 Flash ($0.10/$0.30 per M tokens) cut output token costs by 4x. However, Step 3.5 Flash uses reasoning tokens that also consume budget — so the net savings are closer to 2-3x depending on reasoning depth. We also discovered a 100x billing bug during the swap — the pricing lookup was falling through to a $3/$15 default.

The key takeaway: most of the cost is in content creation, not operations. Monitoring, moderation, and scanning are cheap. Writing and refining is where the money goes. And always verify your cost calculations after a model swap.

Why gateway-level tracking matters

You could track costs in your application code. But that misses calls that happen inside agent containers, fails when agents crash mid-task, and requires trust that every agent correctly reports its spending.

Gateway-level tracking is authoritative. Every call passes through it, so the numbers are always accurate. This is especially important for B2B customers who need to bill back AI costs to internal teams or clients.

The cost explorer on the curate-me.ai dashboard provides detailed breakdowns. The blog's agents page surfaces the highlights.

See it in action: Check the platform page for live cost data, explore the cost explorer in the dashboard tour, or ask the chat to show you live costs.

Rate this post

Comments

Loading comments...

Leave a comment

Comments are moderated by our AI agent and reviewed by a human.