Context Overflow
Detector key: context_overflow · Source: context · Severity:
medium · Suspicion: 76
When a prompt is larger than the model’s context window, the overflow doesn’t error — it’s silently truncated. The model answers using only the part that fit, so decisions get made on partial context: stale retrieval, dropped instructions, missing history. Nothing throws.
What trips it
Section titled “What trips it”For each step, Lumni looks up the model’s context window from its id and
compares it against the step’s input tokens. It fires when
inputTokens > contextWindow.
- Model is read from step metadata (
model) or the run metadata. - Input tokens are read from
inputTokens,promptTokens,input_tokens, orprompt_tokens(including under ausageobject).
Known context windows
Section titled “Known context windows”| Model family | Window (tokens) |
|---|---|
gpt-4.1, gemini-1.5, gemini-2 | 1,000,000 |
gpt-5 | 400,000 |
claude-3 / claude-sonnet / claude-opus / claude-haiku / claude-fable | 200,000 |
gpt-4o, gpt-4-turbo | 128,000 |
gpt-4-32k | 32,768 |
gpt-3.5-turbo-16k | 16,384 |
gpt-4 | 8,192 |
gpt-3.5 | 4,096 |
If the model id isn’t recognized or token counts are missing, the detector stays quiet.
Fixed at 76.
Example
Section titled “Example”{ "metadata": { "model": "gpt-4o" }, "steps": [ { "stepKind": "model", "metadata": { "inputTokens": 175000 } } ]}Lumni reports:
Context overflow: 175000 input tokens exceed gpt-4o’s 128000-token window — The prompt sent to gpt-4o was 175000 tokens — 36% over its 128000-token context window. The model silently dropped the overflow, so it answered using only partial context.