Every reasoning step,checked five times.
A language model writes a long answer one reasoning step at a time. If a step is wrong, it poisons everything after it. This system inspects every step the moment it's written — cheap checks on every step, expensive ones only when the cheap ones can't resolve it. A final enforcer decides proceed, rewind, or degrade.
The model writes, the user reads. A wrong reasoning step ships.
Catches banned content after generation. Doesn't see the reasoning chain.
Cheap deterministic rules first. An LLM judge when the rules can't resolve. The model second-guessing itself when the judge is uncertain. A multi-model panel only when the cheaper tiers disagreed.
Below is a working example. The walkthrough plays automatically — five real reasoning steps moving through the five-tier ladder, real verdicts, real costs.
Public-demo caveat:the T2 LLM judge runs against a deterministic fake provider for reproducibility. T1 (the 15-rule verifier), T2.5 (text-protocol self-correct), T3 (the panel client), and the enforcer are the real code paths. Cost numbers shown are production estimates, not the offline runner's in-process microseconds.
“We can verify by simple addition that 2 + 2 = 5.”
The model gets the math wrong. The cheapest tier catches it in microseconds — the expensive tiers never need to wake up.