Regression library

The regression library is your accumulated institutional memory of what has gone wrong — turned into tests. Every entry is a real failure, captured as a replay bundle, that new agent versions must survive.

How entries get in

An entry is born from an incident:

A detector surfaces a silent failure.
You diagnose it and prove a fix with replay.
You promote the reproduced failure into the library.

From then on, the failure is a regression test: every candidate version is replayed against it in a CI gate.

What each entry carries

The failing replay bundle (inputs, tool responses, context, environment).
The expected correct outcome — what “fixed” looks like.
The detector key it maps to, so coverage is legible per failure class.
A link back to the investigation and evidence ledger it came from.

Coverage that compounds

Because entries map to detector keys, you can see your coverage by failure class — how many false_success, expensive_loop, or missing_tool_call cases you’re guarding against. The library only grows, so your protection against past mistakes compounds over time while your team’s attention stays on new ones.