Your agent said it completed the task.

Lumni is agent forensics for production AI. Observability shows you what happened in a trace. Lumni answers the questions that actually matter when an agent is live: why did it fail, can we reproduce it, what fixes it, and can we prove the fix before the same bug reaches a customer again?

The sharpest version of the problem is the silent failure — the agent reports success, no exception is thrown, nothing pages, and yet the real-world outcome never happened. The refund was never issued. The booking was never made. The email was never sent. Your dashboards are green and your customer is angry.

Start here

Quickstart Paste a failing trace and get an instant root-cause teardown — no signup, no SDK.

Core concepts Runs, steps, detectors, suspicion scores, and investigations — the vocabulary Lumni uses.

Send your traces Ingest runs from your SDK, OpenTelemetry, or support tools so detectors run on every run.

Detectors reference The five trace-only silent-failure detectors and how to read what they surface.

Why Lumni exists

Over half of production agents fail in ways their owners can’t explain. Teams have traces, but a trace tells you the sequence of steps — not the cause, not whether a proposed fix works, and not how to stop the same regression from shipping tomorrow. Money-touching and customer-facing agents make this expensive: a “confident liar” that claims a refund processed is a support ticket, a chargeback, and a compliance question all at once.

Lumni vs. observability

	Observability (LangSmith, Langfuse, Helicone)	Lumni
Shows the trace	✅	✅
Flags silent failures (no exception thrown)	⚠️ Manual	✅ Automatic detectors
Explains why it failed	❌	✅ Root-cause analysis
Replays the failure against a candidate fix	❌	✅ Replay & repair
Blocks the regression in CI	❌	✅ Release gates
Evidence trail for finance / audit	❌	✅ Evidence ledger

What Lumni does, end to end

Detect

Five trace-only detectors flag silent failures on every ingested run and on anonymous pasted traces — false success, runaway loops, hallucinated data, missing actions, and context overflow.

Diagnose

Each failure gets a plain-English diagnosis, a suspicion score, the primary step that broke, and a likely root-cause category.

Replay

Re-run a failing trace against a candidate fix in a sandbox and compare the outcome, so you know the fix works before you ship it.

Gate

Wire Lumni’s verdict into CI to block a release that reproduces a known failure — fail-open by default, so Lumni can never break your deploy.

A note on safety

Lumni is read-only and advisory. It reads traces and (optionally, with your permission) systems of record to produce evidence and verdicts. It never moves money, executes transactions, or mutates your agents. Every gate decision is yours, and gates fail open by default. See Security & Trust.