🔥

Context Window Burn Rate

How fast does your 200K / 1M window fill up?

📚
Learn more — how it works, FAQ & guide
Click to expand

Context Window Burn Rate Tracker

Long-context models (Claude 200K, Gemini 1M) sound infinite — until you watch context fill up at ~1500 tokens per chat turn. This tool predicts when you hit the wall.

How to use this tool

  1. 1

    Pick model context

    Claude 3 = 200K, Gemini 1.5 = 1M, GPT-4 Turbo = 128K.

  2. 2

    Estimate per-turn growth

    How many tokens does each user turn add to context?

  3. 3

    See burn rate

    Turns until you hit the limit + cost trajectory.

Frequently Asked Questions

Why does context grow?
Most chat apps replay full history each turn. After 50 turns, your "input" is 50× the user message length. Cost grows linearly. Latency too.
How do I slow burn?
Summarize old turns into a "memory" block. Drop function-call output that's stale. Use prompt caching to make replay cheap.
What about RAG?
RAG retrieves only relevant chunks per turn — keeps context bounded. The opposite pattern of "stuff everything in".

You might also like

🔒
100% Privacy. This tool runs entirely in your browser. Your data is never uploaded to any server.