Skip to content

Context Overflow

Detector key: context_overflow · Source: context · Severity: medium · Suspicion: 76

When a prompt is larger than the model’s context window, the overflow doesn’t error — it’s silently truncated. The model answers using only the part that fit, so decisions get made on partial context: stale retrieval, dropped instructions, missing history. Nothing throws.

For each step, Lumni looks up the model’s context window from its id and compares it against the step’s input tokens. It fires when inputTokens > contextWindow.

  • Model is read from step metadata (model) or the run metadata.
  • Input tokens are read from inputTokens, promptTokens, input_tokens, or prompt_tokens (including under a usage object).
Model familyWindow (tokens)
gpt-4.1, gemini-1.5, gemini-21,000,000
gpt-5400,000
claude-3 / claude-sonnet / claude-opus / claude-haiku / claude-fable200,000
gpt-4o, gpt-4-turbo128,000
gpt-4-32k32,768
gpt-3.5-turbo-16k16,384
gpt-48,192
gpt-3.54,096

If the model id isn’t recognized or token counts are missing, the detector stays quiet.

Fixed at 76.

trips context_overflow
{
"metadata": { "model": "gpt-4o" },
"steps": [
{ "stepKind": "model", "metadata": { "inputTokens": 175000 } }
]
}

Lumni reports:

Context overflow: 175000 input tokens exceed gpt-4o’s 128000-token window — The prompt sent to gpt-4o was 175000 tokens — 36% over its 128000-token context window. The model silently dropped the overflow, so it answered using only partial context.