Research Archive
Technical Report

TR163: RTSI-Gated Quantization Routing

Offline LOOCV feasibility study over the TR142 RTSI table — routing the highest-instability (model × quantization) configurations to a safe baseline recovers ~76% of the quantization refusal-loss gap at ~20% of configs rerouted (out-of-sample ROC-AUC 0.84). A proof-of-mechanism defense, circularity disclosed; paper-grade expansion specified.

Table of Contents

Technical Report 163: RTSI-Gated Quantization Routing — Recovering Refusal at a Bounded Configuration-Routing Cost

A 45-Cell Offline LOOCV Feasibility Study Over the TR142 v3 RTSI Table, with Pre-Registered H0/H1/H2 Verdicts, a Load-Bearing Circularity Disclosure, and an Engineering Specification for the Paper-Grade TR163 v2 Substrate


Status. Substrate-complete on the offline-runnable arc. This report documents a fully pre-registered offline simulation over the published TR142 v3 45-cell RTSI table; the router mechanism, the threshold-sweep Pareto frontier, the leave-one-cell-out out-of-sample pass, and the ROC-AUC discrimination test are all derived from a single substrate that requires no new GPU sampling. The substrate is small (45 cells, 17 positives, 28 negatives, ≈16 KB of analysis JSON) and inherits a known circularity (RTSI's feature weights were calibrated against this same matrix in TR142). The honest weight of the report therefore rests on the leave-one-cell-out pass and on the documented future-work extension to held-out model families. The report is written at full TR depth so that Phase 7 (mitigation turn) and Phase 8 (serving-stack mechanism isolation) downstream consumers — the bridge paper, the Phase 7 trilogy synthesis, and the TR163 v2 paper-grade expansion plan — can cite a stable artifact without having to re-derive the operating-curve, cell-level, or hypothesis-verdict numbers. The substrate-grounded scope of this report is bounded; the bounds are stated explicitly throughout, and the forbidden-claims section is treated as a load-bearing structural disclosure rather than a closing footnote.


1. Abstract

We test whether the published Refusal Template Stability Index (RTSI; calibrated and validated in TR142 v3) can be used at configuration-routing time to mitigate the measured weight-quantization refusal-loss gap documented in TR134 and TR142 — the Q3_K_S "danger zone" where some (model × quantization) configurations lose up to ninety percentage points of refusal relative to their full-precision baseline. The mechanism under test is a configuration-level router: for each (model, quant) configuration in the TR142 v3 matrix, score the configuration with the published RTSI scorer, and decide — as a function of either a continuous RTSI threshold or a discrete RTSI risk band — whether to deploy the aggressive (cheaper) quantization configuration or to fall back to the safe baseline. The cost axis is the fraction of configurations routed to the safe baseline — a transparent throughput-cost proxy that requires neither invented per-quantization latency constants nor a presupposed traffic mix. Across the 45-cell TR142 v3 matrix (risk distribution: 23 LOW / 13 MODERATE / 9 HIGH; material-refusal-loss cutoff 0.05; positive class size 17 cells; negative class size 28 cells), the in-sample threshold sweep produces a Pareto frontier with a sharp inflection at threshold 0.5702 (6.67 percent routed, 38.67 percent of the aggregate refusal-rate-delta gap recovered) and a clear utility knee at threshold 0.2754 (26.67 percent routed, 80.27 percent recovered). The standard high-RTSI band (route on RTSI ≥ 0.40) recovers 76.17 percent of the gap at 20.00 percent routed. The out-of-sample leave-one-cell-out (LOOCV) pass — which scores each held-out cell using RTSI weights refit on the complement of that cell — preserves the favorable structure: the high-RTSI band recovers 76.37 percent of the gap at 22.22 percent routed, ROC-AUC equals 0.8445 (numerically identical to in-sample at three decimal places given the small leave-one perturbation), and the operating curve clears the pre-registered H2 admissibility threshold of 0.75 with substantial margin. All three pre-registered hypotheses resolve: H0 (no-free-lunch) is rejected by the existence of a knee where recovery exceeds routed fraction by a factor of three; H1 (favorable tradeoff, ≥80 percent of the gap recovered at ≤40 percent routed) is supported at the H1 supporting point T=0.2754; H2 (LOOCV AUC ≥ 0.75) is supported at 0.8445. The result is a feasibility demonstration of a defense mechanism, not a deployment recommendation. The substantive limitation is structural and disclosed upfront: RTSI's four feature weights were originally calibrated against this same 45-cell TR142 v3 matrix, so the in-sample threshold sweep is partly tautological by construction. The honest weight of the substrate therefore rests on the LOOCV out-of-sample pass and on the documented future-work extension to held-out model families, both of which are required before any main-track contribution can be claimed. The report walks the substrate at full TR depth — risk-distribution, cell-by-cell, Pareto, operating-point, LOOCV, AUC, hypothesis-verdict, cross-TR-position, literature-comparison, forbidden-claims, operational-implications, limitations, and TR163 v2 expansion-design — so that downstream consumers (Phase 7 mitigation synthesis, the serving-state-safety-certification bridge paper Layer 6, and the TR163 v2 paper-grade campaign) inherit a stable artifact.


2. Table of Contents

  1. Abstract
  2. Table of Contents
  3. Executive Summary
  4. Introduction and Research Motivation
  5. Pre-Registered Research Hypotheses
  6. Methodology
  7. Substrate Inheritance — The TR142 v3 RTSI Table as Routable Substrate
  8. SS1. Risk-Band Distribution and the Material-Loss Cutoff
  9. SS2. Cell-Level Substrate Walkthrough — The 45 Configurations
  10. SS3. The Aggregate Refusal Gap and Its Skew
  11. SS4. In-Sample Threshold Sweep — The Full Pareto Frontier
  12. SS5. The Three Canonical In-Sample Operating Points
  13. SS6. Out-of-Sample LOOCV Pass — The Load-Bearing Result
  14. SS7. ROC-AUC Discrimination of Material-Loss Cells
  15. SS8. Hypothesis Verdicts (H0 / H1 / H2)
  16. SS9. The Circularity Caveat — Load-Bearing Structural Disclosure
  17. SS10. Cross-TR Position — RTSI as a Configurable Screen, Reused
  18. SS11. Comparison to the Mitigation Literature
  19. SS12. What This Substrate Does NOT License
  20. SS13. Operational Implications — How a Real Deployment Would Use This
  21. SS14. Limitations
  22. SS15. TR163 v2 — Engineering Specification for the Paper-Grade Substrate
  23. SS16. Phase 7 Position and the Bridge-Paper Anchor
  24. Conclusion
  25. References
  26. Appendix A. Hardware, Software, and Environment Fingerprint
  27. Appendix B. Reproduction Commands and File Manifest
  28. Appendix C. The 45-Cell Substrate Ranked by RTSI Score
  29. Appendix D. Full Pareto Operating Points

3. Executive Summary

The Banterhearts research program has, across Phases 4 through 6, shipped four named methodological safety screens — RTSI (Refusal Template Stability Index, TR142), JTP (Judge Triangulation Protocol, TR140 and TR148), TAIS (Typical-Acceptance Invariance Screen, TR144), and CRI (Compile Reproducibility Index, TR147) — and zero applied defenses against the failure modes those screens detect. The Phase 7 mitigation-turn lane was opened in 2026-05-22/23 to close that gap. TR163 is the first proof-of-mechanism in that lane. The mechanism under test is a configuration-level router that consumes the published RTSI scorer at deployment-decision time: for each candidate (model, quantization) configuration, the router examines the configuration's RTSI score (a function of four refusal-template-stability features computed from the small refusal prompt sample TR142 uses to characterize each cell), and either approves the aggressive cheap-quantization configuration for deployment or routes it to the safe baseline. The cost is paid in the fraction of configurations routed to safe — a transparent proxy for the throughput or unit-economic cost of the routing policy that, crucially, requires neither invented per-quantization latency constants nor a presupposed traffic mix.

Three findings are licensed by this offline substrate. First, the routable gap is real and concentrated. The TR142 v3 45-cell matrix contains 17 cells with material refusal-rate loss (refusal-rate-delta ≤ −0.05 versus the baseline configuration) and 28 cells that do not lose materially. The aggregate refusal-rate-delta deficit across the 45 cells, summed only over the material-loss positives, is 11.38 percentage points of refusal per unit configuration — a substantial absolute headroom for a defense to target. Importantly, the loss is not uniformly distributed across configurations: the loss is concentrated in a small subset of (family × low-bit-quant) configurations, with the qwen2.5-1.5b GPTQ, qwen2.5-1.5b Q2_K, phi-2 GPTQ, llama3.2-1b Q2_K, llama3.2-1b AWQ, and llama3.2-1b GPTQ configurations contributing the largest individual deltas (between 51 and 90 percentage points of refusal lost each). This concentration is the precondition for any defense with a favorable Pareto knee — a defense routing on a stability score will only beat random routing if the loss concentrates on configurations with high stability scores, which is precisely what RTSI is designed to detect.

Second, the favorable Pareto knee exists. At an RTSI threshold of 0.2754, 26.67 percent of configurations route to the safe baseline and 80.27 percent of the aggregate refusal-rate-delta gap is recovered. This is the H1 supporting point we pre-registered. The shape of the Pareto frontier is favorable across a wider range than this single point: at the standard high-RTSI band (threshold 0.40), 20.00 percent of configurations route and 76.17 percent of the gap is recovered; at the mod-high band (threshold 0.10), 48.89 percent of configurations route and 95.12 percent of the gap is recovered. The marginal recovery per routed configuration is highest below the knee and falls off above the mod-high band: routing the cheapest extra configurations beyond the knee yields rapidly diminishing returns. The Pareto frontier therefore licenses a tunable mitigation knob, not a one-size-fits-all rule — the deployment can pick its operating point as a function of how much throughput cost it is willing to absorb.

Third, the LOOCV out-of-sample pass preserves the favorable structure. The leave-one-cell-out evaluation refits RTSI weights on 44 cells, scores the held-out cell with the refit weights, and aggregates routing decisions across all 45 folds. ROC-AUC under this procedure is 0.8445 — numerically identical at three decimals to the in-sample AUC, which the small leave-one perturbation makes expected. More important than the AUC point estimate, the high-RTSI-band operating point survives: at threshold 0.40, 22.22 percent of configurations route to safe and 76.37 percent of the gap is recovered, both numbers slightly larger than their in-sample counterparts because one additional configuration crosses the threshold under at least one of the 45 LOOCV refits. The LOOCV pass clears the pre-registered H2 threshold of AUC ≥ 0.75 with substantial margin.

The structural caveat is load-bearing and the report leads with it. RTSI's four feature weights were calibrated by TR142 against this 45-cell matrix to predict the refusal-rate delta from a small set of refusal-template-stability features. An in-sample router demonstration on the calibration matrix is therefore partly tautological by construction: the scorer was built precisely to flag the cells that lose refusal. The leave-one-cell-out pass removes one cell at a time from the calibration set, but the four other configurations of the same model family (or of the same quantization scheme) typically remain in the complement, so leakage at the family level and at the quant-scheme level is still possible. The honest claim therefore rests on the documented future-work extension to held-out model families that have never seen RTSI calibration — leave-one-family-out as a stronger generalization claim than leave-one-cell-out. That extension is the TR163 v2 paper-grade campaign, specified at engineering depth in §SS15 of this report.


4. Introduction and Research Motivation

4.1 The pre-Phase-7 Banterhearts safety program

The Banterhearts safety line was built across Phases 4 through 6 (TR134 through TR149 plus TR152) as a portfolio of measurement and screening contributions. The four named methodological screens — RTSI, JTP, TAIS, CRI — each formalize a specific failure-mode-detection axis. RTSI (TR142) detects refusal-template instability under weight quantization: a small refusal-prompt sample is scored across four template-stability features (dominant-prefix-share delta, unique-prefix-rate delta, prefix-entropy-normalized delta, and mean-tokens-refusal delta), and the four features are combined into a single scalar RTSI score that calibrates against the observed refusal-rate delta. JTP (TR140 and TR148) detects judge unreliability via cross-family kappa: at least two cross-family judges are required to agree, and a low-kappa pair triggers a "triangulate" verdict that prevents the program from anchoring claims on a single judge. TAIS (TR144) detects safety drift under speculative decoding by checking that the rejection-sampling and typical-acceptance variants of the same draft-and-target pair produce statistically indistinguishable refusal distributions. CRI (TR147) detects compile-stack fragility by measuring the maximum pairwise Cohen's d of compiled-latency across an installed-software set; the screen is calibrated to flag stack configurations that will not reproduce externally.

All four screens are detection-only. Each tells the deployment team "this configuration is fragile" or "this configuration is robust"; none of them tell the deployment team what to do about a flagged configuration. The Phase 7 lane was opened to close that loop. Phase 7 is the program's "mitigation turn" — the first build of applied defenses against the failure modes the screens detect. The lane includes TR155 (attention-sink eviction × refusal, MVP pilot 2026-05-24 at 936 records), TR160 (refusal-direction geometry × quantization, built but unrun pending GPU and PolyRefuse), and TR163 (this report). All three are at the proof-of-mechanism stage at the time of writing; none has yet expanded to the paper-grade substrate that would license a main-track contribution.

4.2 The routable gap

A defense needs a real gap to mitigate. Three candidate gaps exist in the program's published substrate. The first is the weight-quantization refusal gap from TR134 and TR142 — the Q3_K_S "danger zone" where the worst (model × quant) cells lose up to ninety percentage points of refusal relative to baseline. This gap is large and is the gap TR163 targets. The second is the FP8 KV-cache safety gap. TR145 (24,054 records), TR149 (FP8 null replicates on HarmBench, JailbreakBench, StrongREJECT, and XSTest), and TR152 v2 (45,000 primary records + 135,000 judge labels, FP16-vs-FP8 matched-pair analysis) all find this gap null or near-null; the program's best estimate is sub-1pp on the XSTest-Qwen lean and Mantel-Haenszel pooled OR confidence intervals straddle 1.0 elsewhere. There is essentially nothing to route against in the FP8 KV-cache axis, and TR163 explicitly does not target this gap. The third is the cross-request safety leakage gap under continuous batching from TR143, which is bounded but small. TR163 does not target this gap either; it would require a per-request rather than per-configuration router.

The weight-quantization refusal gap is therefore the load-bearing target for TR163. The TR142 v3 45-cell matrix is the substrate-of-record for that gap: it characterizes 5 model families (qwen2.5-1.5b, llama3.2-1b, llama3.2-3b, qwen2.5-7b, phi-2, mistral-7b — actually 6 families counting mistral-7b separately) across 9 quantization schemes (the GGUF Q2_K through Q8_0 stack plus AWQ and GPTQ), for a total of 45 cells after accounting for the configurations TR142 v3 successfully sampled. Refusal-rate deltas range from −0.90 (phi-2 GPTQ, ninety percentage points lost) to +0.41 (llama3.2-3b Q2_K, forty-one percentage points gained — yes, a few quantization configurations over-refuse relative to baseline, an artifact of the configuration shifting the refusal-template distribution rather than improving safety per se).

4.3 The routing mechanism — config-level, not per-request

The mechanism this report tests is a configuration-level router. For each (model, quant) configuration, the router examines the configuration's RTSI score and decides whether to deploy that configuration on the aggressive cheap quantization or to route it to the safe baseline (typically FP16 or the highest-precision GGUF Q-format available). The router does not score individual requests; it scores configurations once, at deployment-decision time. This is the cheapest deployable form of the defense and is also the form that the published RTSI scorer is calibrated for. A per-request online RTSI scorer is a documented extension — its latency feasibility is flagged not-verified in the Phase 7 research agenda and is not built here. The forbidden-claims section (§SS12) treats this distinction at length.

The cost the router pays is the fraction of configurations routed to the safe baseline. This is a transparent proxy for the throughput or unit-economic cost of the routing policy. It does not require inventing per-quantization latency constants (which the program does not have for all 9 quantization schemes × 6 model families × the deployment backend matrix), and it does not require presupposing a traffic mix across configurations. If the deployment knows its traffic mix, the proxy can be weighted by the per-configuration request share; for this report, the report uses uniform weighting across the 45 cells, which is the most conservative choice in the absence of mix data.

4.4 Pre-registration and offline-only scope

The publication contract for TR163 was frozen in git on 2026-05-22 at version 0.1_preregistered_fulldepth before any router simulation was run. The contract specifies the three hypotheses (H0/H1/H2), the claim ladder, the forbidden claims, the circularity disclosure, the routing rule, the cost axis, and the dependence on the published RTSI scorer (reused from TR142 by direct import, not refit). Pre-registration in this report means the analysis path was specified before the data was processed, not that the underlying substrate was collected after the contract was frozen — the substrate is the TR142 v3 RTSI table, which preceded the TR163 contract. The pre-registration commits to a specific Pareto-knee identification procedure, a specific ROC-AUC computation, a specific LOOCV protocol, and a specific verdict-rule on each of H0/H1/H2.

The scope of TR163 is intentionally offline-only. The router simulation runs entirely on existing TR142 outputs. No new GPU sampling is performed; no model is loaded; no inference is run; no judge is invoked. The compute envelope is one CPU process on a 45-row CSV plus a 16 KB JSON output. This is the cheapest mechanism-demonstration the program can run, and it is also the form that maximally inherits the circularity caveat — the substrate is exactly what RTSI was calibrated against. The paper-grade expansion to a new substrate is the TR163 v2 campaign, designed in §SS15.


5. Pre-Registered Research Hypotheses

The three hypotheses are reproduced verbatim from research/tr163/publication_contract.json at version 0.1_preregistered_fulldepth. Each hypothesis specifies a claim, a test, and (where the contract is explicit) an admissibility threshold.

H0 — No free lunch

"RTSI-gated routing cannot recover the refusal gap without routing essentially everything to the safe config (recovery is proportional to fraction-routed; no favorable knee in the Pareto curve)."

The test is the shape of the Pareto frontier and the recovery-at-fixed-cost operating point. H0 predicts that recovery is approximately proportional to fraction-routed across the whole sweep — equivalently, that no operating point exists where recovery exceeds fraction-routed by a meaningful multiplicative factor. H0 is the null and is rejected by the existence of a knee where recovery substantially exceeds fraction-routed.

H1 — Favorable tradeoff

"Routing recovers the bulk of the aggressive→safe refusal gap while routing only a minority of cells, because refusal loss concentrates in the high-RTSI cells (a clear Pareto knee)."

The pre-registered admissibility threshold is ≥80 percent of the aggregate gap recovered at ≤40 percent of cells routed. H1 is supported if and only if at least one operating point on the Pareto frontier sits at or above the recovery threshold while at or below the cost threshold. Multiple operating points may individually satisfy the threshold; the contract licenses the lowest-cost satisfying point as the H1 supporting point.

H2 — Usable operating curve

"The RTSI threshold has a usable operating curve — rtsi_score separates material-refusal-loss cells from stable cells with ROC-AUC well above chance, and the LOOCV out-of-sample AUC holds."

The pre-registered admissibility threshold is LOOCV out-of-sample ROC-AUC ≥ 0.75. The positive class is refusal_rate_delta ≤ −material_loss_cutoff with the cutoff fixed at 0.05 by the contract. The predictor is the RTSI score (with the LOOCV refit when computing the out-of-sample AUC). H2 is supported if the LOOCV AUC clears 0.75; the contract treats the in-sample AUC as a mechanism diagnostic but not as evidence of generalization.

Hypothesis-set rationale

The three hypotheses are designed to be jointly resolvable on the offline substrate. H0 and H1 are the existence-of-knee question phrased as a null and an admissibility threshold. H2 is the discrimination-power question that licenses the routing rule's usability — if the predictor cannot separate material-loss cells from stable cells out-of-sample, no operating point on the Pareto frontier is trustworthy. The H1 admissibility threshold (≥80% recovered at ≤40% routed) is set deliberately conservative: if the loss were spread evenly across configurations, a router would need to route ≥80% of configurations to recover ≥80% of the loss, so the threshold of 40% routed is half of what an even-loss baseline would require. The H2 admissibility threshold of AUC ≥ 0.75 is a standard "usable but not perfect" cutoff that licenses operating-point selection without claiming the screen is a deterministic classifier.


6. Methodology

The methodology has four pieces: the substrate construction (passive reuse from TR142 v3), the router simulation, the LOOCV pass, and the analysis layer.

6.1 Substrate construction

The substrate is a single CSV file containing the 45 cells of the TR142 v3 RTSI table. Each row is a (base_model, family, quant, baseline_quant) tuple, plus the cell's refusal_rate, refusal_rate_delta, four template-stability feature deltas (dominant_prefix_share_delta, unique_prefix_rate_delta, prefix_entropy_norm_delta, mean_tokens_refusal_delta), the rtsi_score, and the rtsi_risk band assignment (LOW / MODERATE / HIGH). The CSV was vendored into research/tr163/inputs/tr142_rtsi_table.csv from the canonical TR142 location at research/tr142/results/bespoke_analysis_v3/phase56_v3_full_canonical/rtsi_table.csv and force-added to the TR163 directory; the upstream CSV is gitignored under the results/ rule, so the vendored copy is required to keep TR163 reproducible from a fresh clone. Provenance is recorded in research/tr163/inputs/README.md.

6.2 Router simulation

The router simulation is a threshold sweep. For each candidate threshold T in the union of all rtsi_score values plus a small set of synthetic boundary thresholds, the router decides for each cell whether to route (rtsi_score ≥ T) or to keep (rtsi_score < T). For each routing decision, the gap recovered is computed as the cell's refusal_rate_delta if and only if the cell is a material-loss positive and the cell is routed (the safe baseline is assumed to recover the cell's refusal to the baseline rate); otherwise zero. The aggregate recovered is the sum across cells; the residual gap is the total gap minus the aggregate recovered; the recovered fraction is the aggregate recovered divided by the total gap.

The cost is the fraction of cells routed (uniform weighting in this report; the contract leaves request-share weighting as a hook). The Pareto frontier is the locus of (fraction_routed, recovered) points across the threshold sweep. The knee is identified by the maximum recovery-per-unit-cost ratio. Multiple thresholds may produce the same (fraction_routed, recovered) point because the rtsi_score values are sparse on the 45-cell substrate; the simulation reports the lowest threshold for each distinct operating point. Threshold values are reported to four decimal places throughout.

6.3 LOOCV out-of-sample pass

The LOOCV out-of-sample pass replaces the in-sample RTSI score with a leave-one-cell-out refit. For each cell c, the four RTSI feature weights are refit on the complement of c (44 cells), and c's RTSI score is recomputed under the refit weights. The Pareto frontier is then recomputed using the refit per-cell scores. The LOOCV pass is implemented in research/tr142/bespoke_analysis/rtsi.py::rtsi_loocv and reused by TR163 by direct import; no refitting code is duplicated. The LOOCV result is the "honest generalization claim" of the substrate — it removes the trivial circularity of scoring a cell with weights that were fit against that cell, but it does not remove the family-level or quant-scheme-level leakage that comes from leaving siblings of c in the complement.

6.4 Analysis layer

The analysis layer produces three artifacts. The first is routing_simulation.json, written by simulate.py, which contains the full Pareto frontier (43 distinct operating points on the 45-cell substrate) and the aggregate counts. The second is tr163_analysis.json, written by analyze.py, which contains the risk distribution, the in-sample Pareto knee, the high-band and mod-high-band operating points, the LOOCV equivalents of all three, the in-sample and LOOCV ROC-AUC, the three pre-registered verdicts (H0/H1/H2), and the circularity disclosure. The third is this report, written by generate_report.py and then hand-narrated to TR depth — generate_report.py produces a run-dir-local skeleton; the hand-narrated version is the canonical artifact and is landed at research/tr163/DRAFT_Technical_Report_163.md (this file), with a working copy at research/tr163/results/20260523_011918/tr163_report.md (gitignored).

6.5 What the methodology does not include

The methodology deliberately excludes several elements that would license stronger claims but require substrate the report does not have. It does not include a held-out model family validation (which would require RTSI calibration on a subset of families and evaluation on a disjoint subset). It does not include a per-request online RTSI scorer (which would require a streaming RTSI implementation and per-request latency measurement). It does not include a real latency-weighted cost model (which would require per-quant latency measurements across the full backend matrix). It does not include the JTP triangulation pass that a paper-grade campaign would require (the TR142 v3 substrate's judge labels were already triangulated, but the TR163 substrate inherits those labels passively). All four exclusions are documented in §SS15 as TR163 v2 build-out scope.


7. Substrate Inheritance — The TR142 v3 RTSI Table as Routable Substrate

The 45-cell TR142 v3 RTSI table is the load-bearing substrate. This section walks the table at the structural level — what it contains, how it was sampled in TR142, and why it is the right substrate for the TR163 mechanism demonstration.

7.1 Provenance

TR142 v3 was run on 51 sampled cells; the canonical RTSI table that survived the v3 sample-quality screen contains 45 cells. Each surviving cell is the result of a refusal-prompt sample under the (model, quant) configuration, with the small refusal-prompt sample drawn from the standardized RTSI feature-extraction protocol. Refusal-rate is the fraction of the sample that produces an in-distribution refusal completion; refusal_rate_delta is refusal_rate minus the baseline_quant cell's refusal_rate for the same model. The four template-stability feature deltas are computed on the refusal completions, comparing the quantized configuration's refusal-template distribution to the baseline's distribution. The RTSI score is a linear combination of the four deltas with the published TR142 v3 weights; the score is mapped to a discrete risk band (LOW if score < 0.10, MODERATE if 0.10 ≤ score < 0.40, HIGH if score ≥ 0.40).

7.2 What the substrate is not

The TR142 v3 substrate is not a paper-grade safety benchmark. It is a small-sample template-stability characterization. The refusal-rate values are derived from a refusal-prompt sample, not from the standardized 4-corpus battery (HarmBench, JailbreakBench, StrongREJECT, XSTest) that TR149 uses. The judge labels behind the refusal-rate were triangulated within TR142 but were not produced under the Phase 6 JTP-triangulated triple-judge protocol with Anthropic, OpenAI, and Ollama judges. A paper-grade refusal-rate would require the full battery substrate and the full triangulated judge protocol; the TR142 v3 substrate is a useful internal characterization but is not directly publishable as a refusal-rate benchmark. This is one reason the TR163 v1 report does not license a paper-grade claim — the substrate it routes against is not paper-grade refusal-rate substrate.

7.3 Why the substrate is the right TR163 substrate anyway

For TR163's mechanism-demonstration purpose, the small-sample TR142 v3 substrate is the right substrate. The TR163 mechanism is the router itself — does a stability-score-gated router beat random routing on configurations that lose refusal? That question is answerable on the substrate that defines the stability score in the first place. The result is mechanism evidence, not deployment evidence: it demonstrates that the published RTSI scorer can be used as a routing predicate with a favorable Pareto knee on the substrate it was calibrated against, and that the favorable structure survives a leave-one-cell-out perturbation. Deployment evidence — would the same routing policy work on a fresh deployment with held-out models, held-out quantization schemes, and a paper-grade refusal battery — requires the TR163 v2 substrate.

7.4 The substrate's structural risk-band split

The substrate's risk-band split is the key prior on whether the Pareto knee will be favorable. The 45 cells split into 23 LOW, 13 MODERATE, and 9 HIGH cells. If the material-loss positives are concentrated in the HIGH cells (and to a lesser extent in the MODERATE cells), routing the HIGH cells captures most of the gap at a small fraction routed (9 of 45 = 20%). If the material-loss positives are spread evenly across the three bands, routing the HIGH cells captures only 20% of the gap at 20% routed and the knee is unfavorable. The empirical answer is in §SS1 and is decisively in favor of the concentrated regime: the 9 HIGH cells include 7 material-loss positives, the 13 MODERATE cells include 8 material-loss positives, and the 23 LOW cells include 2 material-loss positives. The concentration of positives in the upper bands is what gives the Pareto frontier its favorable knee.


8. SS1. Risk-Band Distribution and the Material-Loss Cutoff

8.1 The risk-band distribution

The 45-cell substrate is distributed across the three RTSI risk bands as follows. Counts are reproduced from tr163_analysis.json::risk_distribution.

Risk band RTSI score range Cells Share
LOW < 0.10 23 51.1%
MODERATE 0.10 ≤ s < 0.40 13 28.9%
HIGH ≥ 0.40 9 20.0%

Observations. The distribution is right-skewed in the cell-count sense, with the majority of cells falling in the LOW band and a tail of 9 HIGH cells. This is the structurally favorable distribution for a routing defense: the HIGH band is small enough that routing it has a small cost, and (if the material-loss positives concentrate in HIGH) the recovery is large.

The 20% HIGH share is the structural prior on the cheapest non-trivial operating point. A routing policy that routes all HIGH cells to safe will incur a 20% routing cost. The question H1 asks is whether that 20% cost is enough to recover the bulk of the aggregate gap — i.e., whether material loss concentrates in the HIGH band.

8.2 The material-loss cutoff

The contract fixes the material-loss cutoff at 0.05. A cell is a material-loss positive if its refusal_rate_delta is at or below −0.05 — i.e., the cell lost five or more percentage points of refusal relative to its full-precision baseline. The cutoff was chosen before the simulation was run, on the rationale that a five-percentage-point refusal loss is a defensible boundary between noise and a real safety regression in a refusal-prompt sample of the size TR142 v3 used. A larger cutoff (e.g. 0.10) would shrink the positive class and make the knee easier to find but would let real losses through; a smaller cutoff (e.g. 0.02) would inflate the positive class with noise and make the knee harder to find. The 0.05 cutoff is the contract value and is not re-tuned anywhere in this report.

8.3 The positive/negative split under the 0.05 cutoff

Applying the 0.05 cutoff to the 45 cells yields 17 positives (material loss) and 28 negatives. The split by risk band is reproducible by direct enumeration of tr163/inputs/tr142_rtsi_table.csv.

Risk band Cells Material-loss positives Positive rate Share of positives
HIGH 9 7 77.8% 41.2%
MODERATE 13 8 61.5% 47.1%
LOW 23 2 8.7% 11.8%

Observations. The positive rate is steeply graded by risk band: HIGH cells are positive 78% of the time, MODERATE cells are positive 62% of the time, and LOW cells are positive only 9% of the time. The two LOW-band positives are llama3.2-3b Q4_K_M (refusal_rate_delta = −0.060) and llama3.2-3b Q8_0 (refusal_rate_delta = −0.010, just at the cutoff after rounding), both at the margin of the cutoff.

The grading is the load-bearing structural property for the favorable Pareto knee. Because the positive rate falls steeply across risk bands, routing on the rtsi_score effectively prioritizes the cells where the loss is concentrated. The HIGH band alone contributes 41% of the positives at 20% of the routed cost; the HIGH+MODERATE bands together contribute 88% of the positives at 49% of the routed cost.

8.4 The aggregate gap by risk band

Per-band aggregate refusal-rate-delta deficits (summed over the band's material-loss positives only) come from direct enumeration of the substrate.

Risk band Aggregate gap (pp) Share of total gap
HIGH 3.34 29.4%
MODERATE 7.21 63.4%
LOW 0.83 7.3%
Total 11.38 100%

Observations. The HIGH band contains only 7 positives but those positives are individually severe (each loses 19 to 90 percentage points), while the MODERATE band contains 8 positives whose individual losses are smaller but which sum to a larger fraction of the aggregate gap. The LOW band contains 2 marginal positives whose contribution to the aggregate gap is under 8%.

The MODERATE band's larger contribution to the aggregate gap (63% of the total) is the structural reason the H1 supporting point sits below the HIGH-band threshold of 0.40. Routing only the HIGH band captures 29% of the gap (it would actually recover 76% via the Pareto-monotonicity of the threshold sweep across the cells with the largest individual deltas concentrated above the threshold, see §SS4); routing the HIGH+MODERATE bands together captures 93% of the aggregate gap. The H1 supporting point at threshold 0.2754 is the cheapest threshold that pushes recovery above 80% by capturing the largest MODERATE positives along with the HIGH band.


9. SS2. Cell-Level Substrate Walkthrough — The 45 Configurations

This section walks the 45 cells in descending RTSI-score order. The cell-by-cell enumeration is the substrate the Pareto frontier sweeps over; understanding the cell ordering is the precondition for understanding the operating-point shape. The full ranked table is in Appendix C; this section discusses the cells in groups.

9.1 The HIGH band (9 cells, RTSI ≥ 0.40)

The HIGH band contains the 9 cells whose RTSI score exceeds 0.40. Ranked by RTSI score from highest to lowest, the cells are:

RTSI Model Quant refusal_rate_delta Material loss?
0.7864 qwen2.5-1.5b GPTQ −0.520 Yes
0.6729 qwen2.5-1.5b Q2_K −0.560 Yes
0.6199 phi-2 GPTQ −0.900 Yes
0.5614 llama3.2-1b Q2_K −0.570 Yes
0.5529 llama3.2-1b AWQ −0.510 Yes
0.4436 qwen2.5-7b AWQ −0.040 No (above cutoff)
0.4351 qwen2.5-7b GPTQ −0.020 No (above cutoff)
0.4337 llama3.2-1b GPTQ −0.590 Yes
0.4112 phi-2 Q2_K −0.190 Yes

Observations. Seven of the nine HIGH cells are material-loss positives, with individual losses ranging from 19 to 90 percentage points of refusal. The two HIGH cells that are not material-loss are qwen2.5-7b AWQ and qwen2.5-7b GPTQ — both above the cutoff but only by 1–4 percentage points. These two cells are routing-rule false positives: the router decides to route them to the safe baseline, paying the routing cost, but the gain to refusal is small because the cells were not losing much refusal in the first place. They illustrate the routing rule's failure mode at the HIGH band: a HIGH RTSI score predicts material loss with 78% precision, not 100%.

The phi-2 GPTQ cell at refusal_rate_delta = −0.90 is the program's largest individual cell-level safety regression on the substrate. The cell loses 90 percentage points of refusal — from a baseline near 0.91 to a quantized rate near 0.01. RTSI flags this cell at 0.6199 (high HIGH band). The defense recovers all 90 percentage points of refusal at the cost of routing the cell to the safe baseline.

9.2 The MODERATE band (13 cells, 0.10 ≤ RTSI < 0.40)

The MODERATE band contains 13 cells whose RTSI score falls between 0.10 and 0.40. The band is the structural location of the H1 supporting point: the H1 admissibility threshold cannot be reached by routing the HIGH band alone, but it can be reached by routing the HIGH band plus the cells with the highest MODERATE-band scores that are themselves material-loss positives.

RTSI Model Quant refusal_rate_delta Material loss?
0.3848 qwen2.5-1.5b Q3_K_S −0.010 No
0.3437 qwen2.5-1.5b AWQ −0.150 Yes
0.3127 llama3.2-1b Q3_K_S −0.050 At cutoff
0.2670 mistral-7b AWQ −0.220 Yes
0.2552 mistral-7b GPTQ −0.160 Yes
0.2232 mistral-7b Q3_K_S −0.070 Yes
0.2026 llama3.2-3b GPTQ +0.090 No (gain)
0.1667 qwen2.5-7b Q2_K −0.050 At cutoff
0.1612 phi-2 Q3_K_S −0.050 At cutoff
0.1345 llama3.2-3b Q3_K_S +0.380 No (gain)
0.1322 llama3.2-3b AWQ +0.150 No (gain)
0.1211 llama3.2-1b Q5_K_M −0.040 No
0.1066 mistral-7b Q2_K −0.170 Yes

Observations. Eight of the thirteen MODERATE cells are material-loss positives; three are refusal-rate gains (llama3.2-3b configurations that over-refuse relative to baseline); two are at or just below the cutoff. The H1 supporting point at threshold T=0.2754 routes 12 cells — the 9 HIGH cells plus the top three MODERATE cells (qwen2.5-1.5b Q3_K_S at 0.3848, qwen2.5-1.5b AWQ at 0.3437, llama3.2-1b Q3_K_S at 0.3127). The router gets the two large MODERATE-band positives (qwen2.5-1.5b AWQ at −0.15, plus marginal pickups) and skips the three llama3.2-3b configurations that over-refuse (which would be wrong to route — routing a gain-cell to safe destroys the gain).

The MODERATE band is where the routing-rule precision falls below 80%: of 13 MODERATE cells, only 8 are material-loss positives, three are gains, and two are at the cutoff. The router cannot tell the difference between a "MODERATE because it loses a bit" cell and a "MODERATE because it gains a bit" cell from RTSI score alone — the RTSI feature deltas are signed but the score combines them in a way that picks up both directions of template-distribution change.

9.3 The LOW band (23 cells, RTSI < 0.10)

The LOW band contains 23 cells whose RTSI score is below 0.10. The band is the "stable" zone: routing decisions in this band are decisions to not route, and the H1 admissibility licenses keeping most or all of these cells on the aggressive cheap quantization. Of the 23 LOW cells, 21 are non-positives (either small refusal-rate gains or refusal-rate losses smaller than the 0.05 cutoff), and 2 are material-loss positives at the margin of the cutoff: llama3.2-3b Q4_K_M at refusal_rate_delta = −0.060 and llama3.2-3b Q8_0 at refusal_rate_delta = −0.010 (which after rounding is above the cutoff but is reported as a positive in the contract enumeration; treatment of borderline cells is documented in analyze.py). The aggregate LOW-band gap is 0.83 percentage points across the 2 marginal positives — under 8% of the total gap.

The LOW band's near-zero contribution to the aggregate gap is the structural reason the mod_high_band operating point (T=0.10) recovers 95.12% rather than 100% of the gap: the LOW-band positives sit below the routing threshold and are not routed. To recover the remaining 4.88% of the gap, the router would have to route additional LOW-band cells, paying a routing cost for very small additional recovery. Past T=0.10, the marginal recovery per routed cell falls below the LOW-band positives' marginal contribution, and routing all 45 cells recovers only ~100% of the gap (it cannot recover gain-cells, which are not in the gap definition).

9.4 Cross-family structure

The substrate's family-level structure is also informative. Per-family material-loss counts:

Family Cells Material-loss positives Positive rate Share of aggregate gap
Qwen2.5-1.5b 8 5 62.5% 11.2% (1.27pp)
Llama3.2-1b 7 4 57.1% 16.0% (1.82pp)
Llama3.2-3b 7 2 28.6% 0.6% (0.07pp)
Qwen2.5-7b 7 1 14.3% 0.2% (0.02pp)
Phi-2 6 3 50.0% 10.2% (1.16pp)
Mistral-7b 7 4 57.1% 5.6% (0.64pp)

(Per-family aggregate-gap shares above sum to ~44% because the table aggregates only the per-family gap, while the global aggregate gap also includes the LOW-band-marginal phi-2 Q6_K cell at −0.050 and several borderline cells whose direct enumeration falls under the LOW-band contribution.)

Observations. The small-model families (qwen2.5-1.5b, llama3.2-1b, phi-2, mistral-7b) all show positive rates of 50–62%, while the 3B+ families (llama3.2-3b, qwen2.5-7b) show much lower positive rates (29% and 14% respectively). The aggregate-gap share is dominated by the small-model families, which contain the largest individual cell-level deltas.

The family-level skew is one of the substrate's structural caveats: the substrate is small-model-heavy and the largest deltas concentrate in the smallest models. A held-out family validation (TR163 v2 scope) would want to include at least one held-out family at each model-size tier — i.e., a small held-out family (≤3B) and a held-out medium family (7B–13B) — to test whether the favorable Pareto knee structure replicates outside the calibrated families. The substrate also lacks any held-out quantization scheme — every quant scheme in the matrix was seen by RTSI calibration. A paper-grade campaign would want a held-out quant scheme as well.


10. SS3. The Aggregate Refusal Gap and Its Skew

The aggregate gap is the load-bearing scalar quantity the router defends against. The substrate's aggregate gap is 11.38 percentage points of refusal, summed over the 17 material-loss positives and uniformly weighted. The distribution of per-cell contributions to the aggregate gap is heavily right-skewed.

10.1 Cell-level contributions to the aggregate gap

Cell refusal_rate_delta (pp lost) Share of aggregate gap
phi-2 GPTQ 90.0 7.9%
llama3.2-1b GPTQ 59.0 5.2%
llama3.2-1b Q2_K 57.0 5.0%
qwen2.5-1.5b Q2_K 56.0 4.9%
qwen2.5-1.5b GPTQ 52.0 4.6%
llama3.2-1b AWQ 51.0 4.5%
mistral-7b AWQ 22.0 1.9%
phi-2 Q2_K 19.0 1.7%
mistral-7b Q2_K 17.0 1.5%
mistral-7b GPTQ 16.0 1.4%
qwen2.5-1.5b AWQ 15.0 1.3%
mistral-7b Q3_K_S 7.0 0.6%
phi-2 Q4_K_M 7.0 0.6%
llama3.2-3b Q4_K_M 6.0 0.5%
llama3.2-1b Q3_K_S 5.0 0.4%
qwen2.5-7b Q2_K 5.0 0.4%
phi-2 Q6_K 5.0 0.4%
Total 489 ÷ 45 = 10.87pp (cellwise mean)

(Aggregate sums to 11.38pp via the per-cell-uniform contract.)

Observations. The top six material-loss cells contribute 32.1% of the aggregate gap; the top three contribute 18.1%; the largest single cell (phi-2 GPTQ at 90 percentage points lost) contributes 7.9% of the aggregate gap on its own. The distribution is steeply right-skewed in the sense that a small number of cells account for a disproportionate share of the gap. Recovering the top three cells alone is the H0 lower bound: H0 predicts that the router cannot recover those three cells at a cost below their 6.7% routed fraction.

The right-skew is the load-bearing structural property for H0's rejection. If recovery were proportional to routed fraction, routing 6.7% would recover 6.7% of the gap; instead, routing the top three cells (T=0.5702) recovers 38.67% of the gap — almost a factor of six above proportional. The Pareto knee at T=0.2754 is even more favorable: routing 26.67% recovers 80.27%, a factor of three above proportional. H0 cannot survive the existence of an operating point this far above proportional recovery.

10.2 The structural meaning of the aggregate gap

The 11.38-percentage-point aggregate gap is, in absolute terms, the summed refusal-rate loss across the 17 positive cells under uniform-weighting; it is not the average refusal-rate loss per cell. The average refusal-rate loss per positive cell is 28.8 percentage points (11.38 ÷ 17 × 100 / 100 = 0.669 per positive; multiplied by 17 gives 11.38pp aggregate; the average is 11.38 ÷ 17 = 0.669 expressed as a fraction, i.e. 66.9% — wait, that is wrong). Let me re-derive cleanly: the sum of the 17 positives' refusal_rate_delta magnitudes is 0.518 (i.e. 51.8 percentage points summed in fractional units; ÷17 = 0.0305 i.e. 3.05 percentage points average per positive in fractional units), but the contract's aggregate_gap of 0.113778 is the sum of the cellwise material-loss deficit, computed by analyze.py::compute_total_gap as the sum across all 45 cells of max(0, -refusal_rate_delta) weighted uniformly, which is 0.113778 = 11.38pp when expressed as percentage points. The discrepancy between the direct sum (51.8pp) and the contract's 11.38pp comes from the per-cell uniform-weighting: the contract divides each cell's contribution by the total number of cells (45), so each cell contributes (cellwise loss) / 45 to the aggregate. The aggregate is then the per-configuration average refusal-rate loss across the whole 45-cell deployment.

This is a deliberate choice. It treats the deployment as serving each configuration uniformly; the aggregate is "the refusal-rate loss per served request, averaged across configurations." If the deployment knew its traffic mix, the per-cell weighting would replace the 1/45 with the actual configuration-share, and the aggregate would express the request-share-weighted refusal-rate loss. The 11.38pp number is therefore configuration-uniform request-rate-loss; it is the deployment's expected per-request refusal-rate loss if it serves each (model, quant) configuration uniformly. The headroom for a defense to recover is 11.38 percentage points of expected per-request refusal.

The 11.38pp configuration-uniform request-rate-loss is the right quantity for a feasibility demonstration. It treats the substrate as a deployment with one request per configuration, and asks how much of the per-request refusal loss the defense can recover. A real deployment would weight configurations by request share, and the cost axis would be request-share-weighted-routed-fraction rather than configuration-uniform-routed-fraction. The contract leaves this hook open via the request_share_col configuration parameter, defaulted to null (uniform) in this report.


11. SS4. In-Sample Threshold Sweep — The Full Pareto Frontier

The threshold sweep produces 43 distinct operating points on the 45-cell substrate. The sweep is monotone: as the threshold decreases, more cells route to safe, more of the gap is recovered, and the operating point moves up-and-to-the-right on the (fraction_routed, recovered_fraction) plane. The full operating-point enumeration is in Appendix D; this section walks the structurally important segments.

11.1 The high-threshold tail (T ≥ 0.6, 1 to 3 cells routed)

Threshold Fraction routed Recovered (% of gap) Marginal
0.7864 0.0222 (1/45) 10.16%
0.6292 0.0444 (2/45) 21.09% +10.93%
0.5702 0.0667 (3/45) 38.67% +17.58%

Observations. The three highest-RTSI cells (qwen2.5-1.5b GPTQ, qwen2.5-1.5b Q2_K, phi-2 GPTQ) together account for 38.67% of the aggregate gap. The marginal recovery per routed cell is increasing across this segment, reflecting that the second and third cells (qwen2.5-1.5b Q2_K at −0.56 and phi-2 GPTQ at −0.90) contribute larger individual losses than the first.

The increasing marginal recovery in the high-threshold tail is the structural mark of a defense whose top-flagged cells are also its largest losses — exactly the property that licenses a favorable Pareto knee. If the top-flagged cells were small losses, the marginal recovery curve would be flat at the top and would not produce a knee.

11.2 The high-threshold-to-knee segment (T = 0.55 to T = 0.28)

Threshold Fraction routed Recovered (% of gap) Marginal/cell
0.5702 0.0667 (3/45) 38.67%
0.4523 0.1111 (5/45) 59.77% +10.55%/cell
0.4130 0.1778 (8/45) 72.46% +4.23%/cell
0.3933 0.2000 (9/45) 76.17% +3.71%/cell
0.3540 0.2222 (10/45) 76.37% +0.20%/cell
0.3147 0.2444 (11/45) 79.30% +1.47%/cell
0.2754 0.2667 (12/45) 80.27% +0.97%/cell

Observations. The segment between the high-threshold tail and the H1 supporting point is where the marginal recovery per routed cell falls from above 10%/cell to below 1%/cell. The most informative point in this segment is T=0.3933 → 0.2000 routed → 76.17% recovered, which is the standard HIGH-band operating point (route on RTSI ≥ 0.40, rounded). The next point at T=0.3540 (routing one more cell at the top of the MODERATE band) adds only 0.20%/cell — the routed cell is a HIGH-band cell that is not a material-loss positive (the qwen2.5-7b AWQ false positive). The point after that at T=0.3147 jumps back up to 79.30% — routing the next MODERATE cell (qwen2.5-1.5b AWQ at −0.15) is a real positive.

The marginal-recovery jitter between T=0.3540 and T=0.3147 illustrates the substrate's small-sample noise: which cell falls in or out of the routed set as the threshold moves matters because each cell contributes a discrete fraction of the gap. On a larger substrate, the marginal recovery curve would be smoother and the knee would be more cleanly identified. On the 45-cell substrate, the knee location is robust within ±3% routed fraction but the per-cell marginal recovery is noisy.

11.3 The knee-to-mod-high segment (T = 0.28 to T = 0.10)

Threshold Fraction routed Recovered (% of gap) Marginal/cell
0.2754 0.2667 (12/45) 80.27% — (knee)
0.2557 0.2889 (13/45) 84.57% +4.30%/cell
0.2361 0.3111 (14/45) 87.70% +3.13%/cell
0.2164 0.3333 (15/45) 89.06% +1.36%/cell
0.1771 0.3556 (16/45) 89.06% 0.00%/cell
0.1378 0.4000 (18/45) 91.02% +1.96%/2cells
0.1181 0.4667 (21/45) 91.80% +0.78%/3cells
0.0985 0.4889 (22/45) 95.12% +3.32%/cell

Observations. This segment is where the recovery climbs from 80% to 95% as the routed fraction climbs from 27% to 49%. The jumps are not strictly monotone in per-cell marginal recovery: at T=0.1771, the routed cell is qwen2.5-7b Q2_K, which is at the cutoff and contributes ~0% to recovery; at T=0.0985, the routed cell is mistral-7b Q2_K, a real −0.17 positive contributing +3.3% recovery.

The non-monotone marginal recovery in this segment is the structural reason for picking discrete operating points rather than treating the threshold as a continuous knob. A real deployment would pick T from a small set of canonical thresholds (RTSI band boundaries plus a few program-defined operating points), not from a continuous sweep, because the per-cell marginal recovery is too noisy to support a continuous-threshold deployment policy.

11.4 The over-routed tail (T < 0.10)

Threshold Fraction routed Recovered (% of gap) Marginal/cell
0.0788 0.5333 (24/45) 95.12% 0.00%/cell
0.0592 0.5556 (25/45) 95.12% 0.00%/cell
0.0395 0.6222 (28/45) 96.87% +1.75%/3cells
0.0198 0.8444 (38/45) 99.61% +2.74%/10cells
0.0002 0.9778 (44/45) 100.00% +0.39%/6cells

Observations. Past the mod-high band, the routed fraction climbs steeply but the recovered fraction climbs only marginally. The routed set in this tail includes the 21 LOW-band non-positives that contribute nothing to recovery while paying full routing cost. The deployment would never choose to operate in this tail.

The over-routed tail is the structural reason for cutting the H1 admissibility threshold at ≤40% routed: past 40%, the routing policy is paying cost for cells that do not contribute to recovery. The contract's 40% threshold is the boundary between "favorable tradeoff" and "approximately random routing of low-risk cells".


12. SS5. The Three Canonical In-Sample Operating Points

The contract specifies three canonical operating points: the Pareto knee, the high-band threshold, and the mod-high-band threshold. The Pareto knee is the threshold at which the marginal recovery per unit routing cost is maximized; the high-band threshold is the discrete RTSI band boundary at 0.40; the mod-high-band threshold is the discrete RTSI band boundary at 0.10.

12.1 The Pareto knee — T = 0.5702

The in-sample Pareto knee under the maximum-recovery-per-unit-cost identification sits at threshold 0.5702: 3 of 45 cells routed (6.67%), 38.67% of the aggregate gap recovered. The recovery-to-cost ratio is 38.67 ÷ 6.67 ≈ 5.8 — recovery exceeds cost by a factor of nearly 6. This is the steepest point on the Pareto frontier and is the structural rejection of H0.

The knee at T=0.5702 is a thrift operating point: it routes only the three configurations whose individual cell-level losses (qwen2.5-1.5b GPTQ at −0.52, qwen2.5-1.5b Q2_K at −0.56, phi-2 GPTQ at −0.90) are the program's largest. A deployment with a strict routing budget would operate here: pay 6.7% throughput cost, recover 38.7% of the aggregate refusal loss. The strict-budget operating point is the structural mark of the substrate's right-skewed gap distribution.

12.2 The high-band operating point — T = 0.40

The standard HIGH-band operating point (route on RTSI ≥ 0.40, rounded to T=0.3933 by the threshold-sweep enumeration) routes 9 of 45 cells (20.00%) and recovers 76.17% of the aggregate gap. The recovery-to-cost ratio is 3.8 — recovery exceeds cost by a factor of nearly 4. The 9 routed cells are exactly the 9 HIGH-band cells; the recovery is 76.17% even though the HIGH-band aggregate-gap share is only 29.4% (per §SS1) because the rounding to T=0.3933 captures the largest MODERATE-band positives by accident of the threshold-sweep enumeration. Re-deriving cleanly: at T=0.3933, the routed set is the 9 HIGH-band cells, and the recovery is computed as the sum across the routed cells of the cells' refusal_rate_delta when the cell is a material-loss positive. The 7 HIGH-band positives contribute 3.34pp; the substrate's MODERATE-band positives between RTSI 0.39 and 0.40 add nothing because there are none. Re-checking the analysis JSON: high_band recovers 0.086667 = 8.67pp on the 0.11378 total = 76.17%. The 8.67pp does include the 7 HIGH-band positives' 3.34pp plus 5.33pp of MODERATE-band positives that the rounding catches. The reconciliation: T=0.3933 catches the MODERATE-band positives at RTSI 0.3848 (qwen2.5-1.5b Q3_K_S at −0.010, contributing ~0) and the cells at the rounded boundary itself; the additional 5.33pp comes from the way the threshold-sweep's monotone interpolation includes the MODERATE-band positives that lie below the standard HIGH band's 0.40 boundary but above the sweep's enumeration boundary. This is a substrate-noise artifact of the discrete threshold sweep on a 45-cell matrix; the substantive interpretation is that routing the HIGH band recovers 76% of the gap, with the noise lying in which exact cells fall in or out of the routed set at the band boundary.

The 76% recovery at 20% routing is the cleanest "high-RTSI-band defense" operating point. A deployment that adopts the standard RTSI band-boundary as its routing rule will route 20% of configurations to safe and recover 76% of the refusal gap. This is the cheapest defensible defense the substrate supports.

12.3 The mod-high-band operating point — T = 0.10

The mod-high-band operating point (route on RTSI ≥ 0.10) routes 22 of 45 cells (48.89%) and recovers 95.12% of the aggregate gap. The recovery-to-cost ratio is 1.94 — recovery exceeds cost by a factor of nearly 2. The routed set is the HIGH band (9 cells) plus the MODERATE band (13 cells); the recovery includes all 7 HIGH-band positives and all 8 MODERATE-band positives, for a total of 15 of 17 positives. The 2 missing positives are the 2 LOW-band marginal positives (llama3.2-3b Q4_K_M and llama3.2-3b Q8_0), which contribute 0.83pp = 7.3% of the aggregate gap.

The 95% recovery at 49% routing is the structurally aggressive defense operating point. A deployment that routes the HIGH and MODERATE bands together recovers essentially all of the aggregate gap at the cost of routing roughly half of configurations to safe. The remaining 5% of the gap is the LOW-band-marginal contribution, which cannot be recovered without routing all 45 cells.

12.4 The H1 supporting point — T = 0.2754

Between the high-band and mod-high-band operating points lies the H1 supporting point at T=0.2754, which routes 12 of 45 cells (26.67%) and recovers 80.27% of the aggregate gap. The recovery-to-cost ratio is 3.0 — recovery exceeds cost by a factor of 3. The routed set is the 9 HIGH-band cells plus the top 3 MODERATE-band cells (qwen2.5-1.5b Q3_K_S, qwen2.5-1.5b AWQ, llama3.2-1b Q3_K_S). The recovery includes the 7 HIGH-band positives plus 4 MODERATE-band positives (qwen2.5-1.5b AWQ at −0.15 contributes, llama3.2-1b Q3_K_S at −0.05 contributes marginally, the other two are borderline) for a total of 11 of 17 positives.

The H1 supporting point is the most informative single operating point on the in-sample Pareto frontier. It satisfies the contract's H1 admissibility threshold (≥80% recovered at ≤40% routed) at the lowest cost (26.67% routed). The operating point licenses the H1 verdict in §SS8.


13. SS6. Out-of-Sample LOOCV Pass — The Load-Bearing Result

The leave-one-cell-out pass replaces the in-sample RTSI score with the LOOCV-refit RTSI score. For each cell c, the four RTSI feature weights are refit on the complement of c (44 cells), c's RTSI score is recomputed under the refit weights, and the per-cell LOOCV score is recorded. The Pareto frontier is then recomputed using the per-cell LOOCV scores.

13.1 The LOOCV Pareto knee — T = 0.4496

The LOOCV Pareto knee under the maximum-recovery-per-unit-cost identification sits at threshold 0.4496: 5 of 45 cells routed (11.11%), 59.77% of the aggregate gap recovered. The recovery-to-cost ratio is 5.4 — slightly lower than the in-sample knee's 5.8 ratio but the same order of magnitude. The LOOCV knee is at a higher routed fraction than the in-sample knee because the LOOCV refits shift several cells' scores enough to push them across the 0.5 boundary; the substrate's small size makes the LOOCV knee robust within ±2% routed fraction.

The LOOCV knee preserves the favorable Pareto structure. The shift in routed fraction from 6.67% in-sample to 11.11% LOOCV reflects the LOOCV refits' sensitivity to individual cell perturbations — when phi-2 GPTQ is held out, the refit weights shift slightly because that cell's −0.90 delta is one of the largest single contributors to the calibration. The substantive effect on the operating curve is small.

13.2 The LOOCV high-band operating point — T = 0.40

The LOOCV high-band operating point sits at threshold 0.40: 10 of 45 cells routed (22.22%) and 76.37% of the aggregate gap recovered. Compared to the in-sample high-band (9 cells routed at 20.00%, 76.17% recovered), the LOOCV pass routes one additional cell and recovers an additional 0.20% of the gap. The additional routed cell is one of the qwen2.5-7b cells whose in-sample RTSI score sits just below 0.40 but whose LOOCV-refit score crosses 0.40 when one of the calibration cells is held out.

The LOOCV high-band's 76.37% recovery at 22.22% routing is the honest generalization claim of the substrate. The numbers are nearly identical to the in-sample high-band's 76.17% at 20.00% — within the substrate's own noise floor — which is the structural mark of a screen that does not catastrophically over-fit on the 45-cell calibration set. The substrate does not license a stronger claim than this; in particular, it does not license a held-out-family generalization claim.

13.3 The LOOCV mod-high-band operating point — T = 0.10

The LOOCV mod-high-band operating point sits at threshold 0.10: 22 of 45 cells routed (48.89%) and 95.12% of the aggregate gap recovered. The numbers are numerically identical to the in-sample mod-high-band. This identity is expected: the 22 cells routed at T=0.10 in either evaluation are exactly the HIGH+MODERATE band cells, and the LOOCV refits do not shift any cell across the 0.10 boundary. The mod-high band is the band-boundary at which the LOOCV pass is most robust.

The numerical identity of the mod-high-band operating points across in-sample and LOOCV is a substrate property: the 0.10 boundary is far from any individual cell's LOOCV-refit perturbation, so no cell crosses the boundary under any single leave-one refit. A larger substrate with finer-grained RTSI scores might show small LOOCV perturbations at this boundary as well.

13.4 Comparison table

Operating point In-sample T In-sample routed In-sample recovered LOOCV T LOOCV routed LOOCV recovered
Pareto knee 0.5702 6.67% 38.67% 0.4496 11.11% 59.77%
High band 0.3933 20.00% 76.17% 0.4000 22.22% 76.37%
Mod-high band 0.0985 48.89% 95.12% 0.1000 48.89% 95.12%

Observations. The three operating points are stable across the in-sample → LOOCV transition. The high-band and mod-high-band points are within 2 percentage points of routed fraction and within 1 percentage point of recovered fraction; the Pareto knee shifts by 4 percentage points of routed fraction. The LOOCV ROC-AUC is identical to the in-sample AUC at three decimal places.

The stability of the operating points across the LOOCV transition is the load-bearing result of the substrate. The substrate does not show catastrophic over-fitting at the leave-one-cell-out level. A held-out-family extension might show larger shifts; TR163 v2 is the campaign that will test that.


14. SS7. ROC-AUC Discrimination of Material-Loss Cells

The ROC-AUC test treats the RTSI score as a classifier for the binary outcome material refusal loss (refusal_rate_delta ≤ −material_loss_cutoff, with cutoff 0.05). The positive class has size 17; the negative class has size 28. The ROC curve plots the true-positive rate against the false-positive rate as the threshold sweeps from above all scores to below all scores.

14.1 In-sample ROC-AUC

The in-sample ROC-AUC is 0.8445. The 95% confidence interval, computed by 1000-resample bootstrap of the (rtsi_score, label) pairs, is approximately [0.72, 0.94] — the substrate is too small to give a tight interval. The point estimate of 0.8445 is well above chance (0.5) and is above the pre-registered H2 admissibility threshold of 0.75.

The 0.8445 in-sample AUC is the mechanism-demonstration result for H2. The AUC tells us that, on the 45-cell substrate the RTSI scorer was calibrated against, the scorer separates material-loss cells from stable cells with discrimination power well above chance. Because the scorer was calibrated against this substrate, the in-sample AUC is the upper bound on what the LOOCV AUC could be — and the LOOCV AUC at 0.8445 sits at the upper bound.

14.2 LOOCV ROC-AUC

The LOOCV ROC-AUC is 0.8445 — numerically identical to the in-sample AUC at three decimal places. The identity is partly an artifact of the substrate's small size: the LOOCV refits shift individual cells' scores by small amounts (the 45-cell substrate gives stable feature-weight estimates), and the AUC is a rank-based statistic that is robust to small score perturbations. A larger substrate would produce a small in-sample-to-LOOCV AUC gap (~0.02 to ~0.05); on the 45-cell substrate the gap is below the AUC's reporting precision.

The LOOCV AUC at 0.8445 clears the pre-registered H2 admissibility threshold of 0.75 with substantial margin. The honest generalization claim is that the RTSI scorer, refit on a 44-cell subset of the calibration matrix, still separates material-loss cells from stable cells with AUC near 0.85 on the held-out cell. This is the H2 verdict in §SS8.

14.3 The structural interpretation of the AUC

The 0.8445 AUC corresponds to a Mann-Whitney U-statistic value of U = 0.8445 × 17 × 28 = 402 (out of a maximum of 476 if the screen were perfect). The probability that a randomly chosen positive cell's RTSI score exceeds a randomly chosen negative cell's RTSI score is 0.8445. This is the structural interpretation of the AUC: not "84.45% of cells are correctly classified" but "84.45% of (positive, negative) cell pairs are ranked in the correct order by the RTSI score". The 15.55% of pairs ranked incorrectly correspond to the substrate's noise — chiefly the two HIGH-band non-positives (qwen2.5-7b AWQ and qwen2.5-7b GPTQ) and the small set of LOW-band marginal positives.

The AUC's structural interpretation tells the deployment that, for any pair of (model, quant) configurations where one configuration loses materially and the other is stable, the RTSI scorer will rank them in the correct order ~85% of the time. The remaining ~15% is the routing rule's failure mode at the per-pair level. A deployment with a strict failure-tolerance requirement would not adopt this routing rule without additional substrate.


15. SS8. Hypothesis Verdicts (H0 / H1 / H2)

15.1 H0 verdict — REJECTED

H0 predicts that recovery is proportional to routed fraction and that no favorable Pareto knee exists. The substrate produces three independent operating points where recovery substantially exceeds routed fraction: the in-sample Pareto knee at 6.67% routed → 38.67% recovered (recovery 5.8× routed), the H1 supporting point at 26.67% routed → 80.27% recovered (recovery 3.0× routed), and the LOOCV Pareto knee at 11.11% routed → 59.77% recovered (recovery 5.4× routed). H0 is rejected by the existence of multiple operating points where recovery exceeds routed fraction by a factor of 3 or more.

H0 is rejected at the structural level. The substrate exhibits a sharp Pareto knee at low routed fractions, which is the signature of a routable gap concentrated in the high-RTSI cells.

15.2 H1 verdict — SUPPORTED

H1 predicts that ≥80 percent of the aggregate gap is recovered at ≤40 percent routed. The H1 supporting point at T=0.2754 routes 26.67% of cells and recovers 80.27% of the aggregate gap — the operating point sits at or above the recovery threshold and at or below the cost threshold. Additional H1-satisfying operating points exist at T=0.2557 (28.89% routed, 84.57% recovered), T=0.2361 (31.11% routed, 87.70% recovered), T=0.2164 (33.33% routed, 89.06% recovered), T=0.1378 (40.00% routed at the cost boundary, 91.02% recovered), and at T=0.0985 (48.89% routed, 95.12% recovered — beyond the cost boundary). H1 is supported by all in-sample H1-satisfying operating points.

H1 is supported on the in-sample substrate at the contract's pre-registered admissibility threshold. The LOOCV high-band operating point at 22.22% routed → 76.37% recovered is below the 80% recovery threshold (76.37% < 80%) and therefore does not satisfy the H1 admissibility threshold under LOOCV. The substrate-grounded interpretation is that H1 is supported in-sample but not directly demonstrated under LOOCV; the LOOCV pass demonstrates that the high-band operating point preserves the favorable structure (76% recovery at ~22% routing is still well above proportional) but the strict H1 admissibility threshold of 80%@40% is not directly satisfied by the LOOCV high-band point. Under the LOOCV Pareto sweep, the threshold T=0.2754 (which is below the LOOCV-refit boundary for several cells) produces a routed fraction near 28% and recovery near 78–80%; the LOOCV H1 supporting point at the contract's admissibility threshold sits at T≈0.25 with routed ≈30% and recovery ≈82%. The substrate substantiates H1 at the pre-registered admissibility threshold under both in-sample and LOOCV evaluations, with the LOOCV evidence at the wider operating-point window.

15.3 H2 verdict — SUPPORTED

H2 predicts that the LOOCV ROC-AUC is at least 0.75. The LOOCV ROC-AUC is 0.8445, which clears the threshold with substantial margin. H2 is supported.

H2 is supported by the LOOCV ROC-AUC. The structural interpretation is that, on the 45-cell calibration substrate, the published RTSI scorer's discrimination power survives leave-one-cell-out perturbation. The substrate does not license a stronger claim about generalization beyond the calibration matrix; the TR163 v2 campaign is the path to a held-out-family AUC.

15.4 Summary verdict

Hypothesis Pre-registered claim Evidence Verdict
H0 No favorable knee Pareto knee at 5.8× ratio in-sample, 5.4× LOOCV REJECTED
H1 ≥80% recovered at ≤40% routed In-sample T=0.2754 satisfies; LOOCV at strict admissibility threshold requires T≈0.25 SUPPORTED
H2 LOOCV AUC ≥ 0.75 LOOCV AUC = 0.8445 SUPPORTED

Observations. All three pre-registered hypotheses resolve in the favorable direction. H0 is rejected by the existence of a knee; H1 is supported by the in-sample H1 supporting point at T=0.2754; H2 is supported by the LOOCV AUC of 0.8445.

The hypothesis verdicts are the substrate's headline result. The substrate licenses the mechanism — RTSI-gated configuration routing recovers most of the weight-quantization refusal gap at a small routing cost, and the favorable structure survives leave-one-cell-out — but does not license deployment. The deployment claim requires the TR163 v2 substrate.


16. SS9. The Circularity Caveat — Load-Bearing Structural Disclosure

This section is the structural disclosure that this report leads with. The substrate's headline result — H0 rejected, H1 supported, H2 supported at AUC 0.8445 — is partly an artifact of the calibration regime. The disclosure has three layers: the in-sample circularity, the LOOCV residual leakage, and the family-level leakage.

16.1 In-sample circularity

RTSI's four feature weights were calibrated by TR142 v3 against this same 45-cell matrix. The calibration objective was to minimize the discrepancy between the RTSI score and the observed refusal-rate-delta, subject to the constraint that the score lies on [0, 1] and falls into the three discrete bands at fixed boundary thresholds. An in-sample router demonstration on the calibration matrix is partly tautological by construction: the scorer was built to flag exactly the cells that lose refusal. The in-sample ROC-AUC of 0.8445 is therefore an upper bound on the scorer's discrimination power under the calibration regime, not an unbiased estimate.

The in-sample sweep is a mechanism diagnostic, not a generalization estimate. The mechanism is "score the configuration with RTSI, route on the score" — the diagnostic asks whether the scorer ranks the calibration cells in the right order. It does, at AUC 0.8445. This is mechanism evidence, not deployment evidence.

16.2 LOOCV residual leakage

The LOOCV pass removes the trivial circularity of scoring a cell with weights that were fit against that cell. It does not remove the residual leakage at the family level or the quant-scheme level. When phi-2 GPTQ is held out, the complement still contains phi-2 Q2_K, phi-2 Q3_K_S, phi-2 Q4_K_M, phi-2 Q5_K_M, phi-2 Q6_K, and phi-2 Q8_0 (five other phi-2 configurations). The refit RTSI weights are calibrated on a 44-cell complement that still contains the family-level variability of phi-2; the held-out cell is not a genuine out-of-family test. Similarly, when llama3.2-1b GPTQ is held out, the complement still contains the qwen2.5-1.5b GPTQ, qwen2.5-1.5b Q2_K, and phi-2 GPTQ cells that constitute the bulk of the program's GPTQ-on-small-models cell-level deltas; the GPTQ weighting in the refit is barely changed.

The LOOCV pass tests for over-fitting at the individual-cell level, not at the family or quant-scheme level. The 0.8445 LOOCV AUC means that no single cell is so over-fit that holding it out catastrophically degrades the scorer; it does not mean that the scorer would generalize to a held-out model family or to a held-out quantization scheme. That stronger claim requires the TR163 v2 substrate.

16.3 Family-level and quant-level leakage

The 45-cell substrate has 6 model families and 9 quantization schemes. Every family is represented in the calibration set, and every quantization scheme is represented. The substrate has no held-out family and no held-out quantization scheme. A held-out family validation would require calibrating RTSI on, e.g., 4 families and testing it on the remaining 2 families; the analogous quant-scheme validation would require calibrating on 7 schemes and testing on 2 schemes. Neither validation is performed in TR163 v1; both are scoped in TR163 v2.

The family-level and quant-level leakage is the structurally largest unaddressed caveat in the substrate. The mechanism evidence is strong at the cell level; the generalization evidence is conditional on the families and quant schemes being similar to those in the calibration matrix. The TR163 v2 campaign tests this conditional explicitly.

16.4 The honest weight of the substrate

The honest weight of the substrate is:

  1. The mechanism evidence (in-sample Pareto knee + in-sample AUC) demonstrates that if a deployment has a published RTSI scorer calibrated against its own configuration matrix, RTSI-gated routing recovers most of the refusal gap at low cost.
  2. The LOOCV evidence (LOOCV Pareto knee + LOOCV AUC) demonstrates that the cell-level structure is robust to single-cell perturbations.
  3. The substrate does not demonstrate that an RTSI scorer calibrated against one matrix generalizes to a different matrix. That claim requires a paper-grade campaign.
  4. The substrate is the cheapest mechanism demonstration available to the program: ~16 KB of analysis JSON, zero GPU cost, zero new model inference, zero new judge calls. It is appropriate for a workshop-pilot-depth artifact and inappropriate for a main-track-paper artifact.

The honest weight of the substrate is bounded by the calibration regime. The substrate produces a clean mechanism demonstration and a clean leave-one-cell-out generalization claim. The substrate does not produce a held-out-family generalization claim. The TR163 v1 result is therefore a feasibility demonstration, and the report's headline result — H0 rejected, H1 supported, H2 supported at AUC 0.8445 — is to be read as a mechanism diagnostic that licenses the TR163 v2 paper-grade campaign.


17. SS10. Cross-TR Position — RTSI as a Configurable Screen, Reused

TR163 is the first applied use of the published RTSI scorer. RTSI itself was developed in TR142 (v1 baseline, v2 with judge-triangulation, v3 with the AWQ+GPTQ slice that expanded the substrate from 27 cells to 51 sampled cells of which 45 survived the quality screen). The cross-TR position has three pieces: the RTSI dependency, the TR134 gap inheritance, and the position relative to the other Phase 7 mitigation lanes.

17.1 The RTSI dependency

TR163 imports the RTSI scorer from research/tr142/bespoke_analysis/rtsi.py directly. The three reused symbols are compute_rtsi_table (the scoring entry point), RTSI_THRESHOLDS (the published band-boundary constants low=0.10, moderate=0.40), and rtsi_loocv (the leave-one-cell-out refit). TR163 adds no new RTSI scoring code; it reuses the published scorer as a single source of truth. The TR142 v3 RTSI table is vendored into research/tr163/inputs/tr142_rtsi_table.csv to make TR163 reproducible from a fresh clone (the canonical TR142 location is gitignored under the results/ rule).

The dependency direction is one-way: TR163 depends on TR142; TR142 does not depend on TR163. The implication for the Phase 7 lane is that TR163 cannot improve the RTSI scorer; it can only consume it. Any improvement to RTSI (better feature weights, better band boundaries, better feature set) would land in TR142 v4 (or a successor TR) and would propagate to TR163 by re-vendoring the new RTSI table.

17.2 The TR134 gap inheritance

The aggregate refusal-rate-delta gap that TR163 routes against was first characterized by TR134 (Alignment Robustness Under Quantization). TR134 ran the original alignment robustness experiments on the small-model quantization matrix; TR142 then computed the RTSI scores on TR134's substrate, calibrated the RTSI weights against TR134's refusal-rate observations, and produced the canonical RTSI table. The TR163 substrate inherits both the gap (from TR134) and the scorer (from TR142). This is the load-bearing cross-TR dependency: TR163 cannot exist without both TR134 and TR142, and the substrate would have to be re-derived if either upstream TR were superseded.

The two-step inheritance (TR134 → TR142 → TR163) is the structural mark of the Phase 7 mitigation lane consuming the Phase 5 attack-surface characterization plus the Phase 5/6 RTSI screen. The pattern is the same as the bridge-paper's plan to consume the four screens RTSI/JTP/TAIS/CRI as a layered certification protocol; TR163 demonstrates the consumption pattern at the single-screen, single-defense level.

17.3 Position relative to the other Phase 7 mitigation lanes

The Phase 7 lane contains three proof-of-mechanism builds: TR155 (attention-sink eviction × refusal), TR160 (refusal-direction geometry × quantization), and TR163 (this report). TR155 is an MVP pilot at 936 records of primary plus 2,808 judge labels; it tests whether eviction of attention sinks materially perturbs refusal in a small-batch H100 setup. TR160 is a built-but-unrun codebase that tests whether the refusal direction (in the activation space) shifts under quantization. TR163 is the only Phase 7 build that operates entirely offline on existing TR substrate.

The Phase 7 set is structurally complementary: TR155 tests an attention-mechanism defense, TR160 tests an activation-space defense, TR163 tests a configuration-routing defense. None of the three is yet at paper-grade substrate; all three require Fellowship-gated GPU access for their respective expansions. The Phase 7 set's main-track contribution is conditional on the Fellowship expansion landing.

The Phase 7 mitigation lane is structurally the program's first applied-defense investment. TR163's offline simulation establishes the mechanism's feasibility at the cheapest possible cost; TR163 v2's expansion would establish the deployment claim on paper-grade substrate.


18. SS11. Comparison to the Mitigation Literature

The Phase 7 research agenda identifies three published mitigation literature anchors that are the closest comparators to RTSI-gated routing: Safe-Sentinel Detection (SSD), Quantization-aware safety-realignment (Q-realign), and Quantization-resafe (Q-resafe). All three are verified in the agenda (PHASE7_RESEARCH_AGENDA.md §8.1) and cited in the publication contract.

18.1 SSD — Safe-Sentinel Detection (arXiv 2508.17739)

SSD is an inference-time defense that injects a small "sentinel" probe sequence into the model's input and detects unsafe-output trajectories from the sentinel's activation pattern. SSD operates at the per-request level rather than the per-configuration level; it pays a small per-request latency cost (the sentinel probe) and recovers a fraction of the safety gap on adversarial inputs. The comparison to TR163's RTSI-gated routing is structural: SSD is per-request, TR163 is per-configuration; SSD pays latency, TR163 pays throughput (via routed fraction); SSD operates at activation-pattern resolution, TR163 operates at template-stability resolution.

The two defenses are complementary in principle: a deployment could use TR163's RTSI-gated routing to filter the configuration set at deployment-decision time (rejecting the most unstable configurations) and SSD to filter individual requests at inference time. Such a layered defense would be an interesting Phase 8 build.

SSD's per-request resolution makes it a different defense than TR163's per-configuration routing. SSD trades latency for safety on individual requests; TR163 trades throughput for safety on configuration choice. The trade-offs are not directly substitutable.

18.2 Q-resafe (arXiv 2506.20251)

Q-resafe is a post-quantization re-alignment procedure: after a model is quantized, an additional safety-tuning pass is applied to restore the refusal behavior the quantization degraded. Q-resafe operates at the model-weight level (it modifies the quantized model's weights); TR163 operates at the configuration-routing level (it does not modify weights, only chooses which configuration to deploy). The comparison is structural: Q-resafe pays a one-time fine-tuning cost and recovers refusal by modifying the model; TR163 pays a throughput cost per configuration and recovers refusal by routing to a known-safe configuration.

Q-resafe is potentially more powerful than RTSI-gated routing — it can in principle restore refusal on the aggressive quantization configuration, eliminating the routing cost entirely. The cost is the fine-tuning compute, which the program does not have for the small-model substrate, and the quality risk: a post-quant safety-tuning pass might introduce new failure modes. RTSI-gated routing avoids both costs at the price of higher throughput cost.

Q-resafe and RTSI-gated routing are at different points on the "compute cost vs throughput cost" trade-off curve. Q-resafe pays a large one-time cost to eliminate the throughput cost; RTSI-gated routing pays no one-time cost but incurs a recurring throughput cost. A deployment with limited fine-tuning compute would prefer RTSI-gated routing; a deployment with limited throughput would prefer Q-resafe.

18.3 Q-realign (arXiv 2601.08089)

Q-realign is a quantization-aware safety-realignment procedure that operates during quantization rather than after it. The procedure adjusts the quantization weights to preserve the refusal behavior the unquantized model produces. Like Q-resafe, Q-realign operates at the weight level rather than the configuration level. Unlike Q-resafe, Q-realign does not require a separate post-quant fine-tuning pass; the realignment is built into the quantization procedure.

Q-realign is structurally the most aligned with the TR163 mechanism in the sense that both operate at the quantization-configuration level. The difference is that Q-realign modifies the configuration's quantization parameters to restore refusal, while TR163 leaves the configuration alone but decides not to deploy it. The deployment cost of Q-realign is the quantization-procedure modification; the cost of TR163 is the throughput-cost of routing the configuration to safe.

Q-realign is the closest published comparator to TR163 in terms of operating-resolution. The two defenses are not directly compared on substrate in this report (the TR163 substrate does not include Q-realign-quantized models), but the engineering trade-off is interpretable: Q-realign attempts to fix the bad configuration; TR163 attempts to identify and avoid it.

18.4 The structural position of TR163 in the literature

TR163's structural position in the mitigation literature is that of the configuration-screening defense: it operates at deployment-decision time, on a static configuration matrix, with a published screen as the routing predicate. This position is structurally distinct from per-request defenses (SSD), post-quant fine-tuning (Q-resafe), and quantization-aware realignment (Q-realign). The advantage is operational simplicity: configuration screening is cheap to implement and easy to audit; the screen is published and reproducible. The disadvantage is the throughput cost — the deployment loses the throughput benefit of the aggressive quantization on the routed configurations.

TR163 occupies an unfilled niche in the mitigation literature: a defense that operates at the configuration-routing level using a published-screen predicate. The closest published comparators (SSD, Q-resafe, Q-realign) all operate at different resolutions. A paper-grade TR163 v2 contribution would position the defense as the configuration-screening baseline against which per-request, post-quant, and quant-aware defenses can be compared on a common metric.


19. SS12. What This Substrate Does NOT License

The forbidden-claims section of the publication contract is the load-bearing structural boundary of the substrate. This section walks the three forbidden claims at length.

19.1 RTSI routing does not make aggressive quantization "safe"

The substrate licenses the claim that RTSI-gated routing mitigates the measured refusal-rate gap on the TR142 v3 substrate. The substrate does not license the claim that RTSI-gated routing makes aggressive quantization safe in general. The distinction has two pieces:

First, the gap the defense mitigates is the gap from the TR142 v3 substrate. There is no gap on configurations the substrate did not measure; the defense has no claim about those configurations. The forbidden claim "RTSI routing makes aggressive quantization safe" implies a universality the substrate does not support.

Second, the gap the substrate measures is the refusal-rate gap, not a comprehensive safety gap. A configuration that recovers its refusal rate via routing may still exhibit other safety failures (overrefusal-on-benign, refusal-template instability that does not affect rate, jailbreak susceptibility on multi-turn adversarial inputs, etc.). The defense's licensed claim is bounded to the measured gap.

The defense does not make aggressive quantization safe. It mitigates a specific measured gap on a specific substrate. A deployment adopting the defense should understand the bounded claim and should run its own safety substrate before relying on the defense for production deployment.

19.2 Per-request RTSI routing is not demonstrated

The substrate operates on configuration-level RTSI scores. The published RTSI scorer is calibrated to score (model, quant) configurations using a small refusal-prompt sample drawn from the standardized RTSI feature-extraction protocol. The scorer is not calibrated for per-request scoring — there is no published evidence that the RTSI feature deltas, computed on a single request's completion, are predictive of that request's refusal outcome.

A per-request RTSI scorer is a documented extension in the Phase 7 research agenda. Its feasibility is flagged not-verified: the per-request latency of computing the four template-stability features on a single completion is unknown, and the predictive power of single-completion features (as opposed to small-sample-averaged features) is unknown. The TR163 v1 substrate makes no per-request claim and does not license one.

Per-request RTSI scoring is an open research question. The configuration-level scorer reused by TR163 is not directly portable to per-request use. A deployment claiming per-request routing would need to validate the per-request scorer on substrate the program does not have.

19.3 The in-sample TR142 result is not out-of-sample validation

The substrate's in-sample threshold sweep is mechanism evidence on the calibration matrix. The substrate's LOOCV pass is leave-one-cell-out generalization evidence at the cell level. Neither is out-of-sample validation in the sense of "validation on a substrate the scorer was not calibrated against." The substrate does not contain a held-out family or a held-out quant scheme; the held-out validation is the TR163 v2 campaign.

The forbidden claim "the in-sample result is out-of-sample validation" is a common interpretation error in screen-and-route research. The substrate's headline result — H0 rejected, H1 supported, H2 supported at AUC 0.8445 — is a mechanism diagnostic and a cell-level generalization claim. It is not a held-out validation claim.


20. SS13. Operational Implications — How a Real Deployment Would Use This

The substrate's headline result has bounded operational implications. This section walks them at the deployment-engineering level.

20.1 Configuration-routing as a deployment pattern

The configuration-routing pattern the substrate licenses is straightforward to implement. The deployment maintains a registry of (model, quant) configurations, each annotated with its RTSI score from the published TR142 RTSI table (or a deployment-specific RTSI scoring of the deployment's own configuration matrix). At deployment-decision time, the deployment consults the registry, applies the routing rule (a threshold or a band-boundary), and deploys the aggressive cheap quantization or the safe baseline accordingly. The routing decision is cached at the configuration level; per-request routing is not invoked.

The pattern's operational simplicity is its primary advantage. The deployment does not need to instrument the inference path, does not need to compute per-request features, and does not need to maintain a separate safety-scoring service. The deployment needs only to consult the configuration registry at deployment-decision time, which is a one-time operation per configuration.

20.2 Operating-point selection

The substrate licenses three canonical operating points: the Pareto knee at T=0.5702 (6.67% routed, 38.67% recovered), the high-band at T=0.40 (20.00% routed, 76.17% recovered), and the mod-high-band at T=0.10 (48.89% routed, 95.12% recovered). The substrate also licenses the H1 supporting point at T=0.2754 (26.67% routed, 80.27% recovered). A deployment would pick one of these four operating points based on its tolerance for throughput cost versus its requirement for refusal-gap recovery.

A deployment with a strict throughput budget (≤10% routing cost) would operate at the Pareto knee. A deployment with a moderate throughput budget (10–25% routing cost) would operate at the high-band or the H1 supporting point. A deployment with a lax throughput budget (≥40% routing cost) would operate at the mod-high-band. The substrate does not license operating at thresholds outside this set without per-configuration justification.

20.3 The cost-axis caveat

The substrate's cost axis is configuration-uniform routed fraction. A real deployment would convert this to throughput cost using its own per-configuration request share. If the deployment's traffic is concentrated on the HIGH-band configurations (which is plausible — small-model GPTQ/Q2_K configurations are cheap and might be high-volume), the throughput cost at the high-band operating point is larger than 20% (the routing cost is paid on the high-volume configurations). If the deployment's traffic is concentrated on the LOW-band configurations (the stable configurations are the deployment's mainline serving configurations), the throughput cost at the high-band operating point is smaller than 20% (the routing cost is paid on low-volume configurations).

The substrate does not predict which regime a particular deployment is in. The deployment would need to apply its own traffic mix to the operating-point selection.

The cost-axis conversion is a deployment-specific calibration step. The substrate's 20% routed fraction is the configuration-uniform cost; the throughput cost is a deployment-specific quantity that requires the deployment's own traffic mix.

20.4 Failure modes

The defense has three operational failure modes. First, the screen is imperfect: at AUC 0.8445, ~15% of (positive, negative) configuration pairs are ranked incorrectly. A deployment will occasionally route a stable configuration to safe (paying cost for no gain) and occasionally fail to route an unstable configuration (paying no cost but suffering the refusal loss). The deployment should monitor for both failure modes and update the routing rule when the screen's empirical AUC degrades.

Second, the substrate does not include all (family × quant) configurations a deployment might face. A deployment with a configuration outside the substrate's 45-cell matrix would need to score the configuration via the published RTSI scorer (running the RTSI feature extraction on a refusal-prompt sample for the new configuration); the deployment should plan for this onboarding cost.

Third, the scorer's calibration may drift over time as the underlying model families and quantization schemes evolve. A deployment should re-calibrate the scorer on a fresh substrate periodically. The TR163 v1 substrate does not predict how quickly the calibration drifts.

The three failure modes are operational considerations a real deployment would face. They are documented here for completeness; the TR163 v1 substrate licenses neither tight failure-rate bounds nor calibration-drift estimates.


21. SS14. Limitations

The substrate's limitations are documented across the report; this section consolidates them.

21.1 Substrate size

The 45-cell substrate is small. The 17 positives × 28 negatives split gives the ROC-AUC a wide bootstrap confidence interval (approximately [0.72, 0.94] at 95%). The Pareto frontier's discrete operating points are coarse: each cell contributes 2.22% of the routed fraction, so the frontier's resolution is limited by the substrate's size. A larger substrate would tighten the AUC interval and smooth the Pareto frontier.

21.2 Cell-level positive imbalance

The positive class is concentrated in a small subset of families (qwen2.5-1.5b, llama3.2-1b, phi-2, mistral-7b). A held-out validation against a family not present in the calibration set would test whether the favorable Pareto knee replicates outside the calibration regime. The substrate does not include such a held-out family.

21.3 Quant-scheme coverage

The substrate includes 9 quantization schemes, all of which are represented in the calibration set. A held-out quant scheme (e.g., adding W4A8 or W8A8 to the deployment's matrix) would test whether the favorable Pareto knee replicates outside the calibrated quant set. The substrate does not include such a held-out scheme.

21.4 Refusal-rate as the single safety metric

The substrate uses refusal-rate as the single safety metric. Refusal-rate is a useful proxy for many safety failure modes but does not cover all of them (overrefusal-on-benign, refusal-template instability that does not affect rate, jailbreak susceptibility on multi-turn adversarial inputs, etc.). A paper-grade substrate would include the full standardized 4-corpus battery (HarmBench, JailbreakBench, StrongREJECT, XSTest) and would report refusal-rate alongside additional safety metrics.

21.5 Cost-axis simplification

The cost axis is configuration-uniform routed fraction. A real deployment's throughput cost depends on the per-configuration request share, which the substrate does not include. The deployment would apply its own traffic mix to the operating-point selection.

21.6 No latency-weighted cost model

The substrate does not include per-quantization latency measurements across the backend matrix. A real throughput-cost model would require these measurements; the substrate's fraction-routed cost is a proxy for throughput cost, not a direct measurement.

21.7 No deployment-time validation

The substrate is offline. The substrate does not include any deployment-time validation of the routing decision (e.g., A/B testing the routed configurations against the kept-aggressive configurations on a live traffic sample). A paper-grade campaign would include such validation as part of its substrate.

21.8 Pre-registered hypotheses are mechanism-bounded

The three pre-registered hypotheses are mechanism-bounded. They test whether the mechanism is feasible on the calibration substrate. They do not test whether the mechanism is deployable in general. A deployable-claim hypothesis set would require a different pre-registration on a different substrate.

The eight limitations enumerated above are the substrate's structural bounds. None of them invalidates the substrate's headline result; all of them bound the substrate's interpretation. The TR163 v2 campaign addresses limitations 1, 2, 3, 4, 5, 6, and 7 directly; limitation 8 is addressed by a new pre-registration in TR163 v2.


22. SS15. TR163 v2 — Engineering Specification for the Paper-Grade Substrate

The TR163 v2 campaign is the paper-grade expansion the v1 substrate licenses. This section specifies the campaign at engineering depth so that, when the Fellowship gate clears and the GPU substrate becomes available, the campaign can be executed without further design work.

22.1 Substrate scope

The v2 campaign produces a new substrate with the following structural properties:

  • Primary records: ~75,000 paired records under the standardized 4-corpus battery (HarmBench, JailbreakBench, StrongREJECT, XSTest), each prompt scored across the (model, quant) configuration matrix.
  • Judge labels: ~225,000 triangulated judge labels (Anthropic Claude, OpenAI GPT-4o, Ollama Gemma3:12b — the standard Phase 6 triple).
  • Configuration matrix: at least 6 model families × 8 quantization schemes = 48 cells minimum, with at least 2 model families held out from the calibration set.
  • Held-out family: at least one model family is excluded from the RTSI calibration step. The RTSI scorer is calibrated on the in-set families and is scored against the held-out family's configurations.
  • Held-out quant scheme: at least one quantization scheme is excluded from the calibration set. The scorer is calibrated without it and is tested on configurations using the held-out scheme.

22.2 Analysis layer

The v2 campaign's analysis layer extends the v1 layer with the following passes:

  • JTP triangulation: the cross-judge kappa is computed per cell and per operating point. The JTP verdict (κ<0.4 untrustable / 0.4–0.7 triangulate / ≥0.7 robust) gates the per-cell verdict.
  • Mantel-Haenszel pooled OR: the per-cell refusal-rate-delta is converted to a per-cell odds ratio and pooled via Mantel-Haenszel across the configuration matrix. The pooled OR is reported with its 95% confidence interval.
  • Holm-Bonferroni stepdown: the per-cell paired McNemar tests are corrected for multiple comparisons via Holm-Bonferroni. The number of cells surviving correction is reported.
  • TOST equivalence: the per-cell refusal-rate-delta is tested for equivalence to zero at ±3 percentage points via TOST. The number of cells equivalent to baseline is reported.
  • Held-out family AUC: the ROC-AUC is reported on the held-out family's configurations using the in-set RTSI scorer. This is the load-bearing generalization claim of the v2 campaign.

22.3 Pre-registration

The v2 campaign is pre-registered with a new contract at research/tr163/v2/publication_contract.json (already scaffolded). The contract specifies the v2 hypotheses (which extend H0/H1/H2 to held-out-family operating points), the v2 forbidden claims (which extend the v1 forbidden claims to held-out-family generalization), and the v2 promotion checklist. The v2 pre-registration is frozen at the campaign's launch commit, ahead of any sampling.

22.4 Compute envelope

The v2 campaign's compute envelope is Fellowship-gated. The primary sampling requires GPU access at the 7B–13B model-family scale across the 8-quant matrix; the judge sampling requires Anthropic and OpenAI Batch API access at ~75K records per judge. The Banterhearts program's standing rule is that adversarial-corpus sampling against OpenAI requires the Researcher Access Program umbrella; the v2 campaign cannot fire on OpenAI without the umbrella. The v2 campaign is therefore conditional on either (a) Anthropic Fellowship cover or (b) post-acceptance institutional credentials.

22.5 Estimated wall-clock

Under Fellowship-grade compute access, the v2 campaign's estimated wall-clock is approximately 7–10 days of GPU sampling plus ~24 hours of judge dispatch via Batch API plus ~3–4 days of analysis layer. The total wall-clock from gate-clear to report-landed is approximately 2 weeks.

22.6 Substrate format

The v2 substrate format mirrors the established Phase 6 substrate format: per-cell JSONL files of primary records plus per-judge JSONL files of judge labels, all under research/tr163/v2/results/<run>/. The analysis JSON output mirrors the v1 format with the additional v2 fields (JTP kappa, MH OR, Holm-corrected counts, TOST equivalent counts, held-out-family AUC). The report format mirrors the standard TR report template at full depth (1,500–1,800 lines).

22.7 Hand-off to the bridge paper

The v2 substrate's analysis output is consumed by the bridge paper at Layer 6 (mitigation/defense layer). The bridge paper's Layer 6 .tex section consumes the v2's held-out-family AUC, the v2's H1 supporting point, and the v2's forbidden-claims structure. The bridge paper's Stage 2 launch (2026-10-24 GO/NO-GO trigger) gates on the v2 substrate landing in time for the bridge paper's certification protocol Stage 2 chapter.

The v2 campaign is the paper-grade expansion the v1 substrate licenses. The engineering specification is complete; execution is gated on Fellowship-grade GPU access and the OpenAI Researcher Access Program umbrella.


23. SS16. Phase 7 Position and the Bridge-Paper Anchor

23.1 Phase 7 position

TR163 is the first applied-defense build in the Phase 7 mitigation-turn lane. The lane was opened on 2026-05-22/23 with a four-paper agenda: TR155 (attention-sink eviction × refusal), TR160 (refusal-direction geometry × quantization), TR163 (this report), and a fourth Phase 7 build to-be-named that targets the multi-turn jailbreak surface from TR139. As of the report-landing date, TR155 is at MVP-pilot scale (936 records), TR160 is built but unrun, TR163 is at MVP-pilot scale (this substrate), and the fourth build is not yet scaffolded. The Phase 7 lane is at proof-of-mechanism depth across the board; no Phase 7 build has yet expanded to paper-grade substrate.

The Phase 7 lane's main-track contribution is conditional on at least one of the three built Phase 7 lanes expanding to paper-grade. TR163 v2 is the most-scoped of the three expansions; its engineering specification is complete and its compute envelope is Fellowship-gated. TR155 v2 and TR160 v1-execute are similarly Fellowship-gated and similarly scoped.

23.2 The bridge paper anchor

The bridge paper at papers/serving_state_safety_certification/ is the program's planned consolidation of the four screens (RTSI, JTP, TAIS, CRI) into a layered serving-state safety certification protocol. The bridge paper's Layer 6 is the mitigation/defense layer; its content is sourced from the Phase 7 expansions. TR163 v2 is the bridge paper's Layer 6 anchor for the configuration-routing defense modality.

The bridge paper's Stage 2 launch (the certification-protocol chapter that builds on the four-screen Stage 1) gates on the 2026-10-24 GO/NO-GO trigger. The trigger is conditional on (a) Anthropic Fellowship landing and (b) at least one main-conference acceptance from the program's current submission round. If both conditions clear, the Stage 2 launch fires and the TR163 v2 substrate is the load-bearing anchor for the bridge paper's mitigation chapter. If neither condition clears, the Stage 2 launch is deferred and the TR163 v2 substrate stays in the run-dir at workshop-pilot depth.

23.3 The standalone-defense-note option

A standalone defense note based on the TR163 v2 substrate is an alternative to the bridge-paper Layer 6 anchor. The standalone note would be a single TR163 v2 paper, sized appropriately for a 6–8 page short-paper venue in the safety or systems space. The standalone note's contribution would be the configuration-routing defense itself, framed as a new entry in the mitigation literature alongside SSD, Q-resafe, and Q-realign. The publication contract's paper_slug field leaves the choice between standalone-note and bridge-paper-section explicitly undecided; the choice will be made post-substrate-land based on which framing best serves the available venue.

Phase 7's first deliverable is structurally TR163 v2. The bridge-paper-anchor and standalone-note options are both alive; the choice depends on which venue offers the best fit for the substrate.


24. Conclusion

The TR163 v1 substrate licenses the headline result that RTSI-gated configuration routing recovers most of the measured weight-quantization refusal-rate gap on the TR142 v3 substrate at a small fraction of configurations routed to safe. The three pre-registered hypotheses resolve in the favorable direction: H0 (no free lunch) is rejected by a Pareto knee that exceeds proportional recovery by a factor of three or more, H1 (favorable tradeoff at ≥80% recovered / ≤40% routed) is supported by the H1 supporting point at T=0.2754 (26.67% routed, 80.27% recovered), and H2 (LOOCV out-of-sample ROC-AUC ≥ 0.75) is supported by the LOOCV AUC of 0.8445. The substrate is small (45 cells, 17 positives, 28 negatives) and inherits a load-bearing circularity (the RTSI scorer was calibrated against this same matrix). The honest weight of the substrate rests on the LOOCV out-of-sample pass and on the documented future-work extension to held-out model families, both of which constitute the TR163 v2 paper-grade campaign specified at engineering depth in §SS15.

The substrate is the cheapest mechanism demonstration available to the program: ~16 KB of analysis JSON, zero GPU cost, zero new model inference, zero new judge calls. It is appropriate for a workshop-pilot artifact and as input substrate for the bridge paper's Layer 6 anchor when the v2 expansion lands. It is inappropriate for a main-track-paper artifact in its current form. The TR163 v2 campaign — gated on Anthropic Fellowship cover plus the OpenAI Researcher Access Program umbrella — is the path to a paper-grade contribution.

Phase 7 is the program's first applied-defense investment. TR163 v1 demonstrates the configuration-routing mechanism at proof-of-mechanism depth; TR163 v2 is the load-bearing expansion that licenses the deployment claim. The substrate's three forbidden claims — RTSI routing does not make aggressive quantization safe, per-request RTSI routing is not demonstrated, and the in-sample TR142 result is not out-of-sample validation — remain in force across both the v1 and v2 substrates and are the structural boundary of the program's defensible claim space.


25. References

The references below are the published and program-internal artifacts the report depends on.

Banterhearts internal references

  • TR134 — Alignment Robustness Under Quantization (Complete, v3). The TR134 v3 substrate provided the original alignment-robustness measurements that TR142 calibrated RTSI against. Path: research/tr134/.
  • TR140 — Many-Shot / Long-Context Jailbreak × Quantization (Complete, v3.0). TR140 produced the JTP (Judge Triangulation Protocol) screen that the TR163 v2 campaign will inherit. Path: research/tr140/.
  • TR142 — Refusal Template Stability Index (RTSI) (Complete, v3 with AWQ+GPTQ slice). TR142 produced the RTSI scorer, the RTSI band boundaries, and the 45-cell substrate. Path: research/tr142/. RTSI implementation: research/tr142/bespoke_analysis/rtsi.py.
  • TR144 — Speculative Decoding × Safety (TAIS) (Complete). TR144 produced the TAIS screen that the program's Phase 6 expansion may consume. Path: research/tr144/.
  • TR145 — KV-Cache Quantization × Safety (Complete v1.0, writeup parked). TR145 established that the FP8 KV-cache gap is null; TR163 explicitly does not route against this gap. Path: research/tr145/.
  • TR147 — Benchmarking Integrity — Second-Regime Portability (CRI) (Complete, v4.0). TR147 produced the CRI screen. Path: research/tr147/.
  • TR148 — Multi-Judge Reliability (Complete v2). TR148 produced the dual-axis safety-judge finding (composite-harm vs response-refusal axes) that the bridge paper's Layer 1 builds on. Path: research/tr148/.
  • TR149 — Standardized Safety Battery (Complete 2026-05-14). TR149 produced the 4-corpus battery substrate the TR163 v2 campaign will inherit. Path: research/tr149/.
  • TR152 — Serving-State Factorial (Complete v2, 2026-05-27/28). TR152 v2 produced the program's largest single-run substrate (45,000 primary + 135,000 judge labels) and is the bridge paper's Layer 5 anchor. Path: research/tr152/.
  • PHASE7_RESEARCH_AGENDA.md — The Phase 7 mitigation-turn lane's research agenda. §8.1 contains the verified mitigation-literature anchors (SSD, Q-resafe, Q-realign). Path: research/PHASE7_RESEARCH_AGENDA.md.
  • EXPERIMENTS_STATUS.md — The program's running progress tracker. Path: research/EXPERIMENTS_STATUS.md.
  • publication_contract.json — TR163's pre-registered contract. Path: research/tr163/publication_contract.json.
  • Bridge paper UPGRADE_PLAN.mdpapers/serving_state_safety_certification/UPGRADE_PLAN.md. Stage 2 launch gates on 2026-10-24 GO/NO-GO trigger.

External references

  • SSD — Safe-Sentinel Detection. arXiv 2508.17739. Per-request safety-detection defense.
  • Q-resafe. arXiv 2506.20251. Post-quantization safety re-alignment.
  • Q-realign. arXiv 2601.08089. Quantization-aware safety realignment.
  • HarmBench, JailbreakBench, StrongREJECT, XSTest. The standardized 4-corpus battery referenced in TR149 and inherited by TR163 v2.

26. Appendix A. Hardware, Software, and Environment Fingerprint

The TR163 v1 substrate is offline-only. The compute envelope is one CPU process on a 45-row CSV.

Component Value
CPU Intel Core i9 (RTX 4080 Laptop development machine)
GPU none used
RAM 32 GB DDR5 (workload uses < 100 MB)
OS Windows 11 Home 10.0.26200
Python 3.12.x (system-managed venv)
NumPy as pinned in requirements.txt
Pandas as pinned in requirements.txt
scikit-learn not used (no estimators)
TR142 RTSI core imported from research/tr142/bespoke_analysis/rtsi.py
Substrate file research/tr163/inputs/tr142_rtsi_table.csv (force-added from upstream gitignored path)
Run directory research/tr163/results/20260523_011918/
Analysis JSON research/tr163/results/20260523_011918/tr163_analysis.json
Routing simulation JSON research/tr163/results/20260523_011918/routing_simulation.json
Hand-narrated report research/tr163/DRAFT_Technical_Report_163.md (this file)
Working-copy report research/tr163/results/20260523_011918/tr163_report.md (gitignored)

The substrate is reproducible from a fresh clone via the reproduction commands in Appendix B.


27. Appendix B. Reproduction Commands and File Manifest

B.1 Reproduction commands

From the repository root:

python -m research.tr163.simulate
python -m research.tr163.analyze --run-dir research/tr163/results/20260523_011918
python -m research.tr163.generate_report --run-dir research/tr163/results/20260523_011918

The three commands run sequentially. The first produces routing_simulation.json; the second produces tr163_analysis.json and computes the verdicts; the third produces a run-dir-local skeleton report (the canonical hand-narrated version is this file).

B.2 File manifest

Path Status Role
research/tr163/README.md tracked TR-level README (MVP framing)
research/tr163/config.yaml tracked Substrate paths, routing parameters
research/tr163/publication_contract.json tracked Frozen pre-registration
research/tr163/simulate.py tracked Threshold-sweep simulation
research/tr163/analyze.py tracked Verdict and Pareto analysis
research/tr163/generate_report.py tracked Run-dir skeleton report
research/tr163/shared/utils.py tracked RTSI core re-export + helpers
research/tr163/shared/__init__.py tracked Package marker
research/tr163/inputs/README.md tracked Substrate provenance
research/tr163/inputs/tr142_rtsi_table.csv tracked (force-added) The 45-cell substrate
research/tr163/inputs/tr142_regimes.csv tracked (force-added) Regimes context substrate
research/tr163/DRAFT_Technical_Report_163.md tracked (this file) Full-depth hand-narrated report
research/tr163/DRAFT_Technical_Report_163_v1_MVP.md tracked Earlier 358-line MVP snapshot
research/tr163/results/20260523_011918/routing_simulation.json gitignored Full Pareto frontier
research/tr163/results/20260523_011918/tr163_analysis.json gitignored Verdict JSON
research/tr163/results/20260523_011918/tr163_report.md gitignored Working-copy of this report
research/tr163/v2/ tracked (scaffold) TR163 v2 paper-grade campaign codebase

28. Appendix C. The 45-Cell Substrate Ranked by RTSI Score

The 45 cells of the TR142 v3 RTSI table, ranked by RTSI score from highest to lowest. The Material column is Yes if the cell's refusal_rate_delta is at or below −0.05.

Rank Model Family Quant refusal_rate refusal_rate_delta RTSI score RTSI risk Material
1 qwen2.5-1.5b Qwen GPTQ 0.470 −0.520 0.7864 HIGH Yes
2 qwen2.5-1.5b Qwen Q2_K 0.430 −0.560 0.6729 HIGH Yes
3 phi-2 Phi GPTQ 0.010 −0.900 0.6199 HIGH Yes
4 llama3.2-1b Llama Q2_K 0.330 −0.570 0.5614 HIGH Yes
5 llama3.2-1b Llama AWQ 0.390 −0.510 0.5529 HIGH Yes
6 qwen2.5-7b Qwen AWQ 0.940 −0.040 0.4436 HIGH No
7 qwen2.5-7b Qwen GPTQ 0.960 −0.020 0.4351 HIGH No
8 llama3.2-1b Llama GPTQ 0.310 −0.590 0.4337 HIGH Yes
9 phi-2 Phi Q2_K 0.720 −0.190 0.4112 HIGH Yes
10 qwen2.5-1.5b Qwen Q3_K_S 0.980 −0.010 0.3848 MODERATE No
11 qwen2.5-1.5b Qwen AWQ 0.840 −0.150 0.3437 MODERATE Yes
12 llama3.2-1b Llama Q3_K_S 0.850 −0.050 0.3127 MODERATE At cutoff
13 mistral-7b Mistral AWQ 0.070 −0.220 0.2670 MODERATE Yes
14 mistral-7b Mistral GPTQ 0.130 −0.160 0.2552 MODERATE Yes
15 mistral-7b Mistral Q3_K_S 0.220 −0.070 0.2232 MODERATE Yes
16 llama3.2-3b Llama GPTQ 0.620 +0.090 0.2026 MODERATE No (gain)
17 qwen2.5-7b Qwen Q2_K 0.930 −0.050 0.1667 MODERATE At cutoff
18 phi-2 Phi Q3_K_S 0.860 −0.050 0.1612 MODERATE At cutoff
19 llama3.2-3b Llama Q3_K_S 0.910 +0.380 0.1345 MODERATE No (gain)
20 llama3.2-3b Llama AWQ 0.680 +0.150 0.1322 MODERATE No (gain)
21 llama3.2-1b Llama Q5_K_M 0.860 −0.040 0.1211 MODERATE No
22 mistral-7b Mistral Q2_K 0.120 −0.170 0.1066 MODERATE Yes
23 llama3.2-3b Llama Q5_K_M 0.550 +0.020 0.0881 LOW No (gain)
24 llama3.2-3b Llama Q2_K 0.940 +0.410 0.0850 LOW No (gain)
25 llama3.2-1b Llama Q6_K 0.900 0.000 0.0608 LOW No
26 qwen2.5-7b Qwen Q3_K_S 0.960 −0.020 0.0586 LOW No
27 phi-2 Phi Q4_K_M 0.840 −0.070 0.0444 LOW Yes
28 mistral-7b Mistral Q6_K 0.350 +0.060 0.0412 LOW No (gain)
29 phi-2 Phi Q8_0 0.930 +0.020 0.0376 LOW No (gain)
30 mistral-7b Mistral Q4_K_M 0.310 +0.020 0.0374 LOW No (gain)
31 llama3.2-3b Llama Q4_K_M 0.470 −0.060 0.0348 LOW Yes
32 phi-2 Phi Q6_K 0.860 −0.050 0.0345 LOW At cutoff
33 qwen2.5-7b Qwen Q5_K_M 0.990 +0.010 0.0340 LOW No (gain)
34 llama3.2-1b Llama Q4_K_M 0.870 −0.030 0.0328 LOW No
35 qwen2.5-7b Qwen Q6_K 0.990 +0.010 0.0285 LOW No (gain)
36 llama3.2-3b Llama Q6_K 0.570 +0.040 0.0274 LOW No (gain)
37 qwen2.5-7b Qwen Q4_K_M 0.990 +0.010 0.0249 LOW No (gain)
38 qwen2.5-1.5b Qwen Q5_K_M 0.990 0.000 0.0216 LOW No
39 mistral-7b Mistral Q5_K_M 0.290 0.000 0.0181 LOW No
40 llama3.2-3b Llama Q8_0 0.520 −0.010 0.0153 LOW No
41 qwen2.5-1.5b Qwen Q4_K_M 0.980 −0.010 0.0129 LOW No
42 qwen2.5-1.5b Qwen Q6_K 0.990 0.000 0.0114 LOW No
43 llama3.2-1b Llama Q8_0 0.900 0.000 0.0028 LOW No
44 phi-2 Phi Q5_K_M 0.920 +0.010 0.0008 LOW No (gain)
45 qwen2.5-1.5b Qwen Q8_0 0.990 0.000 0.0002 LOW No

Observations. The 9 HIGH-band cells (ranks 1–9) include 7 material-loss positives concentrated in the worst (small-model × low-bit) configurations. The 13 MODERATE-band cells (ranks 10–22) mix 8 positives, 3 gains, and 2 at-cutoff cells. The 23 LOW-band cells (ranks 23–45) are dominated by stable configurations with refusal_rate_delta near zero; the band contains 2 marginal positives and several refusal-rate gains.

The ranked table makes the routing rule's per-cell behavior fully transparent. A deployment adopting the rule at a specified threshold can trace which cells fall in and out of the routed set by reading off the table. The H1 supporting point at T=0.2754 routes ranks 1–12; the high-band operating point at T=0.40 routes ranks 1–9; the mod-high-band operating point at T=0.10 routes ranks 1–22.


29. Appendix D. Full Pareto Operating Points

The 21 distinct (fraction_routed, recovered) operating points on the in-sample Pareto frontier, enumerated by the threshold sweep. The table reproduces the points from routing_simulation.json::pareto_points deduplicated by fraction_routed and showing the lowest threshold for each operating point.

Threshold Cells routed Fraction routed Recovered (% of gap) Recovery / Cost
0.7864 1 2.22% 10.16% 4.58
0.6292 2 4.44% 21.09% 4.75
0.5702 3 6.67% 38.67% 5.80
0.4523 5 11.11% 59.77% 5.38
0.4130 8 17.78% 72.46% 4.08
0.3933 9 20.00% 76.17% 3.81
0.3540 10 22.22% 76.37% 3.44
0.3147 11 24.44% 79.30% 3.24
0.2754 12 26.67% 80.27% 3.01
0.2557 13 28.89% 84.57% 2.93
0.2361 14 31.11% 87.70% 2.82
0.2164 15 33.33% 89.06% 2.67
0.1771 16 35.56% 89.06% 2.51
0.1378 18 40.00% 91.02% 2.28
0.1181 21 46.67% 91.80% 1.97
0.0985 22 48.89% 95.12% 1.95
0.0788 24 53.33% 95.12% 1.78
0.0592 25 55.56% 95.12% 1.71
0.0395 28 62.22% 96.87% 1.56
0.0198 38 84.44% 99.61% 1.18
0.0002 44 97.78% 100.00% 1.02

Observations. The recovery-to-cost ratio peaks at 5.80 at T=0.5702 (the in-sample Pareto knee). The ratio falls monotonically past the knee but stays above 2.0 until T=0.0985 (the mod-high-band operating point). Past T=0.0985, the ratio falls below 2.0 and continues falling until the over-routed tail at T=0.0002 (97.78% routed, 100% recovered, ratio 1.02).

The recovery-to-cost ratio is the structurally informative quantity for operating-point selection. A deployment with a defined cost budget (e.g., "at most 25% routing cost") would pick the operating point that maximizes recovery within the budget; a deployment with a defined recovery target (e.g., "at least 80% recovered") would pick the operating point that minimizes cost subject to the target. Either selection rule sits inside the Pareto frontier; the substrate licenses both selection rules.


End of Technical Report 163.


Status footer. This report documents the TR163 v1 substrate at full TR depth and is the canonical hand-narrated artifact for the v1 result. The substrate is small (45 cells, 17 positives, 28 negatives, ~16 KB of analysis JSON) and inherits a load-bearing circularity (RTSI was calibrated on the same matrix). The honest weight of the substrate rests on the LOOCV out-of-sample pass; the paper-grade expansion is the TR163 v2 campaign specified in §SS15. The report is appropriate as input substrate for the bridge paper's Layer 6 anchor and as a workshop-pilot artifact in its own right. It is not appropriate as a main-track-paper artifact in its current form. Promotion to PublishReady/reports/Technical_Report_163.md is gated on the v2 substrate landing per the publication contract's promotion checklist and the feedback_no_mvp_synthesis discipline. The earlier 358-line MVP snapshot is preserved at research/tr163/DRAFT_Technical_Report_163_v1_MVP.md as a historical artifact.