Failure diagnosis
A detector tells you a run is suspicious. Diagnosis turns that into something a human can act on: what broke, where, why, and what to look at next.
The diagnosed failure
Section titled “The diagnosed failure”Every failure Lumni surfaces carries:
- Title — a one-line headline, e.g. “False success: agent claimed completion after issue_refund failed”.
- Summary — a plain-English explanation with the actual offending values quoted from the trace, so you don’t have to go digging.
- Primary step — the specific step that broke (
primaryStepId), so the UI can jump straight to it. - Initial failure source — the root-cause category Lumni attributes the failure to.
- Severity and suspicion score.
Root-cause categories
Section titled “Root-cause categories”The initialFailureSource places a failure into one of a few buckets, which is
the first fork in any root-cause investigation:
| Source | Meaning | Where to look |
|---|---|---|
tool | An external tool/API failed or misbehaved | The tool’s response, schema, and status |
runtime | Loops, timeouts, resource exhaustion | Retry logic, backoff, orchestration |
context | Wrong or missing information reached the model | Retrieval, context window, prompt assembly |
coordination | A required step never happened or a hand-off dropped | Agent graph, sub-agent routing, tool wiring |
From detection to narrative
Section titled “From detection to narrative”Detectors are deliberately terse and deterministic — they decide whether a failure is present. The human-readable narrative and any suggested fix are produced downstream (optionally by an LLM judge that reads the trace and the detector’s finding). This split keeps detection cheap and safe to run on every run, while still giving you a rich explanation when you open a failure.
What to do with a diagnosis
Section titled “What to do with a diagnosis”- Group related diagnoses into an investigation and compare against a baseline.
- Capture the evidence in the evidence ledger.
- When you have a candidate fix, replay the failing trace to prove it.