Regression library
The regression library is your accumulated institutional memory of what has gone wrong — turned into tests. Every entry is a real failure, captured as a replay bundle, that new agent versions must survive.
How entries get in
Section titled “How entries get in”An entry is born from an incident:
- A detector surfaces a silent failure.
- You diagnose it and prove a fix with replay.
- You promote the reproduced failure into the library.
From then on, the failure is a regression test: every candidate version is replayed against it in a CI gate.
What each entry carries
Section titled “What each entry carries”- The failing replay bundle (inputs, tool responses, context, environment).
- The expected correct outcome — what “fixed” looks like.
- The detector key it maps to, so coverage is legible per failure class.
- A link back to the investigation and evidence ledger it came from.
Coverage that compounds
Section titled “Coverage that compounds”Because entries map to detector keys, you can see your coverage by failure
class — how many false_success, expensive_loop, or missing_tool_call cases
you’re guarding against. The library only grows, so your protection against
past mistakes compounds over time while your team’s attention stays on new ones.