Skip to content

Testing a fix

You have a diagnosed failure and an idea for a fix — a new prompt, a tool-schema correction, a policy update, a code change. Replay & repair lets you prove the fix against the exact trace that broke, without shipping anything.

  1. Reproduce. Lumni replays the bundle unmodified and confirms the original failure reproduces.
  2. Apply the candidate fix. Point replay at your changed prompt / schema / policy / agent version.
  3. Re-run in a sandbox. The failing trace runs again with the change, using the bundle’s captured tool responses so it’s safe and deterministic.
  4. Compare. Lumni diffs the new outcome against the failure — did the detector stop firing? Did the run reach the intended outcome?
  5. Check for collateral damage. The same fix is run against other traces in the failure’s cluster, so a fix that resolves one case but breaks a sibling is caught here, not in production.

Replay returns one of three verdictsFix Verified, Fix Regresses, or Uncertain — along with the side-by-side comparison and the evidence behind the call. The verdict, and the decision to ship, is recorded in the evidence ledger.