Context Window Burn Rate
How fast does your 200K / 1M window fill up?
📚 Learn more — how it works, FAQ & guide Click to expand
Learn more — how it works, FAQ & guide
Click to expand
Context Window Burn Rate Tracker
Long-context models (Claude 200K, Gemini 1M) sound infinite — until you watch context fill up at ~1500 tokens per chat turn. This tool predicts when you hit the wall.
How to use this tool
- 1
Pick model context
Claude 3 = 200K, Gemini 1.5 = 1M, GPT-4 Turbo = 128K.
- 2
Estimate per-turn growth
How many tokens does each user turn add to context?
- 3
See burn rate
Turns until you hit the limit + cost trajectory.
Frequently Asked Questions
Why does context grow?
Most chat apps replay full history each turn. After 50 turns, your "input" is 50× the user message length. Cost grows linearly. Latency too.
How do I slow burn?
Summarize old turns into a "memory" block. Drop function-call output that's stale. Use prompt caching to make replay cheap.
What about RAG?
RAG retrieves only relevant chunks per turn — keeps context bounded. The opposite pattern of "stuff everything in".
You might also like
🔒
100% Privacy. This tool runs entirely in your browser. Your data is never uploaded to any server.