Context Overflow

Detector key: context_overflow · Source: context · Severity: medium · Suspicion: 76

When a prompt is larger than the model’s context window, the overflow doesn’t error — it’s silently truncated. The model answers using only the part that fit, so decisions get made on partial context: stale retrieval, dropped instructions, missing history. Nothing throws.

What trips it

For each step, Lumni looks up the model’s context window from its id and compares it against the step’s input tokens. It fires when inputTokens > contextWindow.

Model is read from step metadata (model) or the run metadata.
Input tokens are read from inputTokens, promptTokens, input_tokens, or prompt_tokens (including under a usage object).

Known context windows

Model family	Window (tokens)
`gpt-4.1`, `gemini-1.5`, `gemini-2`	1,000,000
`gpt-5`	400,000
`claude-3` / `claude-sonnet` / `claude-opus` / `claude-haiku` / `claude-fable`	200,000
`gpt-4o`, `gpt-4-turbo`	128,000
`gpt-4-32k`	32,768
`gpt-3.5-turbo-16k`	16,384
`gpt-4`	8,192
`gpt-3.5`	4,096

If the model id isn’t recognized or token counts are missing, the detector stays quiet.

Score

Fixed at 76.

Example

{
  "metadata": { "model": "gpt-4o" },
  "steps": [
    { "stepKind": "model", "metadata": { "inputTokens": 175000 } }
  ]
}

Lumni reports:

Context overflow: 175000 input tokens exceed gpt-4o’s 128000-token window — The prompt sent to gpt-4o was 175000 tokens — 36% over its 128000-token context window. The model silently dropped the overflow, so it answered using only partial context.