Skip to content

Failure diagnosis

A detector tells you a run is suspicious. Diagnosis turns that into something a human can act on: what broke, where, why, and what to look at next.

Every failure Lumni surfaces carries:

  • Title — a one-line headline, e.g. “False success: agent claimed completion after issue_refund failed”.
  • Summary — a plain-English explanation with the actual offending values quoted from the trace, so you don’t have to go digging.
  • Primary step — the specific step that broke (primaryStepId), so the UI can jump straight to it.
  • Initial failure source — the root-cause category Lumni attributes the failure to.
  • Severity and suspicion score.

The initialFailureSource places a failure into one of a few buckets, which is the first fork in any root-cause investigation:

SourceMeaningWhere to look
toolAn external tool/API failed or misbehavedThe tool’s response, schema, and status
runtimeLoops, timeouts, resource exhaustionRetry logic, backoff, orchestration
contextWrong or missing information reached the modelRetrieval, context window, prompt assembly
coordinationA required step never happened or a hand-off droppedAgent graph, sub-agent routing, tool wiring

Detectors are deliberately terse and deterministic — they decide whether a failure is present. The human-readable narrative and any suggested fix are produced downstream (optionally by an LLM judge that reads the trace and the detector’s finding). This split keeps detection cheap and safe to run on every run, while still giving you a rich explanation when you open a failure.