Failure diagnosis

A detector tells you a run is suspicious. Diagnosis turns that into something a human can act on: what broke, where, why, and what to look at next.

The diagnosed failure

Every failure Lumni surfaces carries:

Title — a one-line headline, e.g. “False success: agent claimed completion after issue_refund failed”.
Summary — a plain-English explanation with the actual offending values quoted from the trace, so you don’t have to go digging.
Primary step — the specific step that broke (primaryStepId), so the UI can jump straight to it.
Initial failure source — the root-cause category Lumni attributes the failure to.
Severity and suspicion score.

Root-cause categories

The initialFailureSource places a failure into one of a few buckets, which is the first fork in any root-cause investigation:

Source	Meaning	Where to look
`tool`	An external tool/API failed or misbehaved	The tool’s response, schema, and status
`runtime`	Loops, timeouts, resource exhaustion	Retry logic, backoff, orchestration
`context`	Wrong or missing information reached the model	Retrieval, context window, prompt assembly
`coordination`	A required step never happened or a hand-off dropped	Agent graph, sub-agent routing, tool wiring

From detection to narrative

Detectors are deliberately terse and deterministic — they decide whether a failure is present. The human-readable narrative and any suggested fix are produced downstream (optionally by an LLM judge that reads the trace and the detector’s finding). This split keeps detection cheap and safe to run on every run, while still giving you a rich explanation when you open a failure.

What to do with a diagnosis

Group related diagnoses into an investigation and compare against a baseline.
Capture the evidence in the evidence ledger.
When you have a candidate fix, replay the failing trace to prove it.