🔥 ContextFlame

Context profiling report — {{call_count}} API calls — {{session_date}}

Metric Documentation

Header Metrics
API Calls
Total number of API requests in this session, including sub-agent calls (e.g. Haiku for WebSearch).
Total Input
Sum of input_tokens across all calls. Note: with prompt caching, this only counts non-cached tokens. The full context size includes cached tokens.
Total Output
Sum of output_tokens (model responses) across all calls.
Tool Tokens
Tokens from tool results (Read, Grep, Bash output, etc.) injected into the conversation. Estimated via tiktoken.
Tool Ratio
tool_tokens / total_input — fraction of input consumed by tool results. >30% yellow, >50% red.
Duplicate Ratio
duplicate_tool_tokens / total_tool_tokens — tool results that appeared in a previous call (wasted context). >10% yellow, >20% red.
Peak Util
Highest single-call input as percentage of the model's context limit (200k). >70% yellow, >90% red.
Resets
Number of context window resets (conversation exceeded limit and was truncated).
Wasted
Total duplicate tool tokens — content the model re-read unnecessarily.
Carry Cost
Tool results carried over from previous calls (still in conversation history). High values mean old tool outputs are eating context.
Tool Schemas
Tokens for tool definitions (the tools[] array sent every request). These are fixed overhead per call.
tiktoken Error
Average estimation error of tiktoken vs API-reported tokens. Only includes calls where the estimate is within 50% (excludes sub-agent calls with server-side overhead).
Flamegraph
Width = Tokens
Each block's width is proportional to its token count. Wider blocks use more context.
Click to Zoom
Click any block with children to zoom in. Use the breadcrumb to navigate back.
Inspect Content
Click leaf nodes (highlighted on hover) to view the actual text in the bottom panel.
Unattributed
Gap between tiktoken estimate and API-reported total. Usually ~12% for main calls (API overhead). Much larger for sub-agent calls where Anthropic injects internal system prompts server-side.
Carried (dimmed)
Tool results carried over from a previous call, shown at reduced opacity.
Striped blocks
Duplicate tool results — content already seen in an earlier call.
Timeline
Stacked Bars
Each bar is one API call, stacked by component (system, tools, user text, etc.). Heights are normalized to API total including cached tokens.
Red dashed line
Model context limit (200k tokens).
Red triangles
Context resets — conversation was truncated at that call.
Flamegraph — click to zoom, click leaves to inspect content, width = tokens
Token timeline per API call

Top tools by token usage

ToolTokens

Top files by token usage

FileTokens