If your OpenClaw bill has jumped in the last few months, you're not imagining it. Reddit threads with hundreds of upvotes describe the same thing: "insane token burn," 13,000+ input tokens per simple query, $150/month becoming $600/month without changing anything about how you work.
This guide breaks down exactly why it's happening, where every token goes, and what you can do about it -- with and without touching your config.
Why OpenClaw Costs Are Rising in 2026
The default explanation is wrong. Per-token prices have actually fallen from ~$20/M tokens in 2022 to ~$0.40/M for commodity models by mid-2025. But your bill is higher. Here's why.
1. Reasoning Models Are Token-Hungry
OpenAI o1, Claude 3.7 Sonnet Extended Thinking, and Gemini 2.5 Flash Thinking all produce far more output tokens than their predecessors. The meter runs 3-10x faster for the same task. If you're using any reasoning or "extended thinking" model -- even selectively -- your token counts are not comparable to what you measured six months ago.
2. DRAM Costs Are Rising
Micron's CEO said it publicly: DRAM supply is constrained, costs are up 20% year-over-year, and providers are passing this through. AI inference is memory-bound. More expensive RAM means more expensive tokens, regardless of what the published rate cards show on paper.
3. GPT-5.2 Raised Prices 40% With No Warning
OpenAI raised input token prices from $1.25 to $1.75 per million tokens with GPT-5.2 -- a 40% increase. If your OpenClaw setup routes to GPT-4o or newer OpenAI models, your costs went up the day that model deployed, automatically. Content licensing deals will likely trigger similar hikes from other providers later in 2026.
4. Agentic Loops Multiply Everything
A single user action in an agentic OpenClaw workflow might trigger 3, 5, or 10 LLM calls. Each one carries your system prompt, your conversation history, your tool definitions. The cost isn't "one query" -- it's "one query times however many hops your agent takes to complete it." The more capable the workflow, the more it costs per user interaction.
What Typical Users Actually Spend
Based on real user data from the OpenClaw community and public reporting:
| User Type | Monthly Spend | Typical Setup |
|---|---|---|
| Light personal use | $50-150 | Single agent, low query volume |
| Active developer | $150-500 | Multi-agent, daily use, tool calls |
| Power user / heavy automation | $500-1,500 | Autonomous agents, high volume |
| Team / production use | $1,500-3,000+ | Multiple workspaces, agentic pipelines |
These aren't estimates pulled from thin air. Users in r/LocalLLM and r/ClaudeAI report these ranges consistently. The "$150/month going to $600/month" pattern after switching to reasoning models shows up in thread after thread.
The benchmark that matters: 100 daily active agents running 50K tokens each costs around $4,500/month at GPT-4 pricing. That's before you account for system prompts, tool definitions, or context growth over time.
Where Your Tokens Actually Go
Most people assume their queries are the cost driver. They're usually wrong. Here's a realistic token breakdown for a typical OpenClaw workspace:
System Prompt: 2,000-8,000 tokens per call
Your AGENTS.md, SOUL.md, TOOLS.md, and any injected context files get loaded on every single call. If you're using shared-context files, workspace files, and role-based instructions -- add them up. Users have measured 13,000+ input tokens on simple queries that produce 50 tokens of actual output.
Conversation Context: Grows Without Limit
OpenClaw maintains conversation history. Every message adds to the context window carried forward. A conversation that's been running for 20 exchanges carries all 20 exchanges into every subsequent LLM call. This compounds fast.
Tool Definitions: 1,000-4,000 tokens per call
Every tool you have available gets described to the model in each call -- even tools you don't use. Browser, exec, file operations, message, canvas: each one has a JSON schema that costs tokens before you type a single character.
The Model Itself: 800x Price Spread
The spread between the cheapest and most expensive capable model is roughly 800x. Mistral 7B costs about $0.00006 per 300 tokens. GPT-4 costs about $0.05-0.06 per 300 tokens. Claude Opus 4 is $15/M input, $75/M output. If your routing defaults to a top-tier model for every query -- including simple ones where a cheaper model would do the same job -- you're paying the 800x tax constantly.
Manual Optimization: What You Can Do Right Now
These techniques work. They require effort and ongoing maintenance.
Trim Your System Prompt
Audit every file loaded into your workspace context. Remove examples, verbose explanations, and sections that don't change model behavior. Replace prose instructions with structured bullet points. A 6,000-token system prompt that could be 1,500 tokens is costing you 4x more than it needs to.
Context Pruning
Start new conversations for new topics. Archive long threads. Don't let a single conversation grow across dozens of exchanges when the context from message 1 is irrelevant to what you're doing in message 40. This alone can cut costs 40-60% for interactive users.
Model Selection by Task
This is where the biggest savings come from, but also where it's most tedious to manage. A rough decision tree:
- Gemini Flash / Mistral 7B / Haiku: answering factual questions, simple transformations, formatting, summarization of clear content
- GPT-4o / Claude Sonnet: writing, reasoning tasks, code generation, analysis with ambiguity
- Claude Opus / o1-level: genuinely complex multi-step reasoning, research synthesis, critical decisions
The problem: manually deciding which model to use for every task, and staying disciplined about it, is friction that adds up. Most users eventually revert to the default (which is usually expensive).
Limit Tool Availability
If a workflow doesn't need browser access, remove it. Tools you load but don't use still cost tokens in every call. OpenClaw's tool availability is configurable -- use it.
Track and Set Limits
You can't optimize what you don't measure. Use cost tracking at the API key level. Set monthly caps. Knowing you're at $400 of a $500 budget changes how you work.
The Problem With Manual Optimization
Manual optimization works, but it has a ceiling and a cost of its own.
It requires engineering time every time your workflow changes. New tools, new context files, new agents -- all of them reset the calculus. The model that was right for a task last month might not be right today (or might be cheaper on a different provider this month).
Developers who've gone through the full manual optimization cycle report going from $500/month to around $150-200/month through context discipline and model switching. That's real. But it took hours of work, and it degrades as the workspace evolves.
The other 70%+ of savings -- the part that gets you from $150 to $35 -- requires automated compression and routing.
The Automated Fix: claw.zip
claw.zip is a zero-config token optimizer built specifically for OpenClaw. It does two things automatically:
1. Lossless semantic prompt compression. Before each LLM call, claw.zip compresses your prompts semantically -- removing redundancy and verbose phrasing while preserving meaning. A typical 800-token customer service prompt compresses to around 40 tokens. That's a 95% reduction without any change in what the model receives semantically. Microsoft's LLMLingua research demonstrates this kind of compression is achievable while maintaining or even improving output quality.
2. Intelligent model routing. claw.zip analyzes each query and routes it to the cheapest capable model for that specific task. Simple queries hit cheap models (Gemini Flash, Mistral, Haiku). Complex queries get the frontier model they actually need. You don't configure anything -- the routing happens automatically.
The combined effect: 80-93% cost reduction. That's consistent with academic benchmarks: prompt compression alone achieves 50-95% on input costs, and model routing alone achieves 87% overall. Together, research from FrugalGPT (TMLR) shows combined techniques can reach 98% cost reduction while improving accuracy by filtering noise.
Real Before/After
- Before: $500/month (Claude Opus 4 default, full system prompts, growing context)
- After: $35/month (semantic compression + intelligent routing to appropriate model per query)
This isn't a cherry-picked case. It's the arithmetic of sending 80% of your queries to models that cost 800x less than the default, with prompts that are 70-90% shorter.
How It Works in Practice
npx claw-zip
That's the setup. claw.zip intercepts calls from OpenClaw before they hit the API, applies compression, selects the route, and forwards. Your workflows don't change. Your outputs don't degrade. The bill shrinks from day one.
Pricing: Free tier covers up to $50/month in savings. Pro is $29/month with no savings cap and an analytics dashboard showing exactly what was compressed, what was routed where, and what was saved.
What to Do Right Now
If you're actively overspending, here's the priority order:
Measure first. Pull your API usage breakdown by model and call type. You need to know where the spend is concentrated before you can cut it.
Trim your system prompt. Low effort, immediate impact. Audit workspace files loaded on every call. Cut anything that doesn't change behavior.
Add claw.zip. One command, works immediately, handles the 80%+ of savings that manual work can't sustain.
Review model defaults. After claw.zip handles routing, make sure your fallback model for uncovered edge cases is something reasonable -- not Claude Opus by default.
Start fresh conversations. Don't let context compound indefinitely. For ongoing work, archive and restart.
The math is not subtle. At $0.075/M for Gemini Flash versus $15/M for Claude Opus 4, the same semantically-preserved query costs 200x less on the cheaper model. You just need something to make that routing decision automatically, every time, without adding friction to how you work.
That's what claw.zip does.
Sources: ZDNet, Koombea, LangChain State of Agents 2025, Stack Overflow Developer Survey 2025, Microsoft LLMLingua, FrugalGPT (TMLR), MarketsandMarkets, OpenClaw community data