How do I get started with claw.zip?

Run npx claw-zip init in your terminal. It will walk you through credential setup and configuration. You can also sign up at claw.zip and create your first API key from the dashboard.

Will compression change my model outputs?

Our compression is designed to preserve semantic meaning. For most use cases, outputs are identical. If you need guaranteed verbatim passthrough, you can disable compression per-key from the dashboard.

What happens if claw.zip goes down?

claw.zip runs on Cloudflare Workers across 300+ global edge locations. In the unlikely event of an outage, you can point your config back to the Anthropic API directly — no data is locked in our system.

Can I use claw.zip with non-Anthropic models?

Currently, claw.zip is optimized for the Anthropic API (Claude models). Support for additional providers is on our roadmap.

The Complete OpenClaw Cost Guide 2026: Why Your Bill Is Rising and How to Fix It

If your OpenClaw bill has jumped in the last few months, you're not imagining it. Reddit threads with hundreds of upvotes describe the same thing: "insane token burn," 13,000+ input tokens per simple query, $150/month becoming $600/month without changing anything about how you work.

This guide breaks down exactly why it's happening, where every token goes, and what you can do about it -- with and without touching your config.

Why OpenClaw Costs Are Rising in 2026

The default explanation is wrong. Per-token prices have actually fallen from ~$20/M tokens in 2022 to ~$0.40/M for commodity models by mid-2025. But your bill is higher. Here's why.

1. Reasoning Models Are Token-Hungry

OpenAI o1, Claude 3.7 Sonnet Extended Thinking, and Gemini 2.5 Flash Thinking all produce far more output tokens than their predecessors. The meter runs 3-10x faster for the same task. If you're using any reasoning or "extended thinking" model -- even selectively -- your token counts are not comparable to what you measured six months ago.

2. DRAM Costs Are Rising

Micron's CEO said it publicly: DRAM supply is constrained, costs are up 20% year-over-year, and providers are passing this through. AI inference is memory-bound. More expensive RAM means more expensive tokens, regardless of what the published rate cards show on paper.

3. GPT-5.2 Raised Prices 40% With No Warning

OpenAI raised input token prices from $1.25 to $1.75 per million tokens with GPT-5.2 -- a 40% increase. If your OpenClaw setup routes to GPT-4o or newer OpenAI models, your costs went up the day that model deployed, automatically. Content licensing deals will likely trigger similar hikes from other providers later in 2026.

4. Agentic Loops Multiply Everything

A single user action in an agentic OpenClaw workflow might trigger 3, 5, or 10 LLM calls. Each one carries your system prompt, your conversation history, your tool definitions. The cost isn't "one query" -- it's "one query times however many hops your agent takes to complete it." The more capable the workflow, the more it costs per user interaction.

What Typical Users Actually Spend

Based on real user data from the OpenClaw community and public reporting:

User Type	Monthly Spend	Typical Setup
Light personal use	$50-150	Single agent, low query volume
Active developer	$150-500	Multi-agent, daily use, tool calls
Power user / heavy automation	$500-1,500	Autonomous agents, high volume
Team / production use	$1,500-3,000+	Multiple workspaces, agentic pipelines

These aren't estimates pulled from thin air. Users in r/LocalLLM and r/ClaudeAI report these ranges consistently. The "$150/month going to $600/month" pattern after switching to reasoning models shows up in thread after thread.

The benchmark that matters: 100 daily active agents running 50K tokens each costs around $4,500/month at GPT-4 pricing. That's before you account for system prompts, tool definitions, or context growth over time.

Where Your Tokens Actually Go

Most people assume their queries are the cost driver. They're usually wrong. Here's a realistic token breakdown for a typical OpenClaw workspace:

System Prompt: 2,000-8,000 tokens per call

Your AGENTS.md, SOUL.md, TOOLS.md, and any injected context files get loaded on every single call. If you're using shared-context files, workspace files, and role-based instructions -- add them up. Users have measured 13,000+ input tokens on simple queries that produce 50 tokens of actual output.

Conversation Context: Grows Without Limit

OpenClaw maintains conversation history. Every message adds to the context window carried forward. A conversation that's been running for 20 exchanges carries all 20 exchanges into every subsequent LLM call. This compounds fast.

Tool Definitions: 1,000-4,000 tokens per call

Every tool you have available gets described to the model in each call -- even tools you don't use. Browser, exec, file operations, message, canvas: each one has a JSON schema that costs tokens before you type a single character.

The Model Itself: 800x Price Spread

The spread between the cheapest and most expensive capable model is roughly 800x. Mistral 7B costs about $0.00006 per 300 tokens. GPT-4 costs about $0.05-0.06 per 300 tokens. Claude Opus 4 is $15/M input, $75/M output. If your routing defaults to a top-tier model for every query -- including simple ones where a cheaper model would do the same job -- you're paying the 800x tax constantly.

Manual Optimization: What You Can Do Right Now

These techniques work. They require effort and ongoing maintenance.

Trim Your System Prompt

Audit every file loaded into your workspace context. Remove examples, verbose explanations, and sections that don't change model behavior. Replace prose instructions with structured bullet points. A 6,000-token system prompt that could be 1,500 tokens is costing you 4x more than it needs to.

Context Pruning

Start new conversations for new topics. Archive long threads. Don't let a single conversation grow across dozens of exchanges when the context from message 1 is irrelevant to what you're doing in message 40. This alone can cut costs 40-60% for interactive users.

Model Selection by Task

This is where the biggest savings come from, but also where it's most tedious to manage. A rough decision tree:

Gemini Flash / Mistral 7B / Haiku: answering factual questions, simple transformations, formatting, summarization of clear content
GPT-4o / Claude Sonnet: writing, reasoning tasks, code generation, analysis with ambiguity
Claude Opus / o1-level: genuinely complex multi-step reasoning, research synthesis, critical decisions

The problem: manually deciding which model to use for every task, and staying disciplined about it, is friction that adds up. Most users eventually revert to the default (which is usually expensive).

Limit Tool Availability

If a workflow doesn't need browser access, remove it. Tools you load but don't use still cost tokens in every call. OpenClaw's tool availability is configurable -- use it.

Track and Set Limits

You can't optimize what you don't measure. Use cost tracking at the API key level. Set monthly caps. Knowing you're at $400 of a $500 budget changes how you work.

The Problem With Manual Optimization

Manual optimization works, but it has a ceiling and a cost of its own.

It requires engineering time every time your workflow changes. New tools, new context files, new agents -- all of them reset the calculus. The model that was right for a task last month might not be right today (or might be cheaper on a different provider this month).

Developers who've gone through the full manual optimization cycle report going from $500/month to around $150-200/month through context discipline and model switching. That's real. But it took hours of work, and it degrades as the workspace evolves.

The other 70%+ of savings -- the part that gets you from $150 to $35 -- requires automated compression and routing.

The Automated Fix: claw.zip

claw.zip is a zero-config token optimizer built specifically for OpenClaw. It does two things automatically:

1. Lossless semantic prompt compression. Before each LLM call, claw.zip compresses your prompts semantically -- removing redundancy and verbose phrasing while preserving meaning. A typical 800-token customer service prompt compresses to around 40 tokens. That's a 95% reduction without any change in what the model receives semantically. Microsoft's LLMLingua research demonstrates this kind of compression is achievable while maintaining or even improving output quality.

2. Intelligent model routing. claw.zip analyzes each query and routes it to the cheapest capable model for that specific task. Simple queries hit cheap models (Gemini Flash, Mistral, Haiku). Complex queries get the frontier model they actually need. You don't configure anything -- the routing happens automatically.

The combined effect: 80-93% cost reduction. That's consistent with academic benchmarks: prompt compression alone achieves 50-95% on input costs, and model routing alone achieves 87% overall. Together, research from FrugalGPT (TMLR) shows combined techniques can reach 98% cost reduction while improving accuracy by filtering noise.

Real Before/After

Before: $500/month (Claude Opus 4 default, full system prompts, growing context)
After: $35/month (semantic compression + intelligent routing to appropriate model per query)

This isn't a cherry-picked case. It's the arithmetic of sending 80% of your queries to models that cost 800x less than the default, with prompts that are 70-90% shorter.

How It Works in Practice

npx claw-zip

That's the setup. claw.zip intercepts calls from OpenClaw before they hit the API, applies compression, selects the route, and forwards. Your workflows don't change. Your outputs don't degrade. The bill shrinks from day one.

Pricing: Free tier covers up to $50/month in savings. Pro is $29/month with no savings cap and an analytics dashboard showing exactly what was compressed, what was routed where, and what was saved.

What to Do Right Now

If you're actively overspending, here's the priority order:

Measure first. Pull your API usage breakdown by model and call type. You need to know where the spend is concentrated before you can cut it.
Trim your system prompt. Low effort, immediate impact. Audit workspace files loaded on every call. Cut anything that doesn't change behavior.
Add claw.zip. One command, works immediately, handles the 80%+ of savings that manual work can't sustain.
Review model defaults. After claw.zip handles routing, make sure your fallback model for uncovered edge cases is something reasonable -- not Claude Opus by default.
Start fresh conversations. Don't let context compound indefinitely. For ongoing work, archive and restart.

The math is not subtle. At $0.075/M for Gemini Flash versus $15/M for Claude Opus 4, the same semantically-preserved query costs 200x less on the cheaper model. You just need something to make that routing decision automatically, every time, without adding friction to how you work.

That's what claw.zip does.

Sources: ZDNet, Koombea, LangChain State of Agents 2025, Stack Overflow Developer Survey 2025, Microsoft LLMLingua, FrugalGPT (TMLR), MarketsandMarkets, OpenClaw community data