Why does context grow?

Most chat apps replay full history each turn. After 50 turns, your "input" is 50× the user message length. Cost grows linearly. Latency too.

Summarize old turns into a "memory" block. Drop function-call output that's stale. Use prompt caching to make replay cheap.

RAG retrieves only relevant chunks per turn — keeps context bounded. The opposite pattern of "stuff everything in".

🔥

Context Window Burn Rate

How fast does your 200K / 1M window fill up?

Context window (tokens)

System prompt tokens

Avg user turn tokens

Avg assistant turn tokens

Tool/function output / turn

Input price ($/M)

📚

Learn more — how it works, FAQ & guide

Click to expand

Estimate monthly AI API costs for teams of any size

Classify your AI system — Prohibited, High, Limited, Minimal risk

Predict when your API budget runs out — month by month

🔒

100% Privacy. This tool runs entirely in your browser. Your data is never uploaded to any server.