Skip to content

Regression library

The regression library is your accumulated institutional memory of what has gone wrong — turned into tests. Every entry is a real failure, captured as a replay bundle, that new agent versions must survive.

An entry is born from an incident:

  1. A detector surfaces a silent failure.
  2. You diagnose it and prove a fix with replay.
  3. You promote the reproduced failure into the library.

From then on, the failure is a regression test: every candidate version is replayed against it in a CI gate.

  • The failing replay bundle (inputs, tool responses, context, environment).
  • The expected correct outcome — what “fixed” looks like.
  • The detector key it maps to, so coverage is legible per failure class.
  • A link back to the investigation and evidence ledger it came from.

Because entries map to detector keys, you can see your coverage by failure class — how many false_success, expensive_loop, or missing_tool_call cases you’re guarding against. The library only grows, so your protection against past mistakes compounds over time while your team’s attention stays on new ones.