Papers

5 papers submitted to NeurIPS 2026 · 6 in preparation

Independent research on inference optimization, constitutional AI architectures, and empirical safety evaluation. Each paper is backed by reproducible technical reports and artifact-level provenance from a 752,000+ measurement program.

Author: Sahil Kadadekar · Independent research

Submitted · NeurIPS 2026

Papers total

Technical Reports

752K+

Measurements

NeurIPS 2026 — Submitted · Under Review

5 papers submitted with PDFs, artifact manifests, and venue checklists complete. Now under peer review.

Compile-Stack Attribution

Submitted

Independent upstream bugs in PyTorch and Triton jointly produce the torch.compile decode crash. Triton minor-version ablation on the same GPU flips the conclusion. Benchmark identity is a 5-tuple (GPU, Triton, PyTorch, cache, compile mode). Companion to upstream PR #175562.

Target:NeurIPS 2026

Evidence:TR126 TR147

Quality-Safety Correlation Under Quantization

Submitted

Simpson's paradox in the safety-quality relationship. Refusal Template Stability Index calibrated with LOOCV across 51 model-format cells (GGUF + AWQ + GPTQ), inter-judge κ = 0.873.

Target:NeurIPS 2026

Evidence:TR125 TR134 TR142

Many-Shot Jailbreak Under Quantization

Submitted

Q2_K is the recurring vulnerability threshold for many-shot and long-context attacks. Message-array vs faux-dialogue prompt formatting (92% vs 0% ASR) across 4 model families. Format mediates effect more strongly than quantization alone.

Target:NeurIPS 2026

Evidence:TR140

Speculative Decoding Safety — Null Result

Submitted

16,783 samples across production-scale 70B target + 8B draft pairs (adversarial draft, quantized draft, non-greedy decoding). Zero measurable safety degradation, contradicting the SSD premise. Strong null result.

Target:NeurIPS 2026

Evidence:TR144

Multi-Turn Jailbreak × Quantization

Submitted

8 attack strategies × 4 models × 6 quantization levels: 10,600 conversations, 37,825 judge labels. Threshold-specific shift in risk rather than universal multi-turn amplification.

Target:NeurIPS 2026

Evidence:TR139

In Preparation

Synthesis papers and methodology work derived from the published technical report archive.

Batch Inference Safety Under Non-Determinism

Submitted

Phase 1 safety flips at ~0.58% vs capability ~0.14% under controlled batching. Refusal-to-compliance dominant direction. Reduced true-batching validation reaches ~99.4% agreement with synchronized dispatch.

Target:AI4Good

Evidence:TR138

Inference Optimization Is Not Safety-Neutral

Synthesis

Synthesis paper. Quantization drives 57% of total safety cost, backend choice 41%, concurrency 2%. Chat template divergence can induce larger safety shifts than numerical precision.

Target:TBD

Evidence:TR134 TR135 TR136 TR137

Empirical Capacity Planning for Local LLM Inference

Synthesis

Capacity planning as a fitted systems problem. Backend choice, context length, and memory pressure all materially change the feasible operating regime. Planner quality should be judged by validation against explicit targets, not analytic elegance.

Target:Systems venue

Evidence:TR123 TR127 TR133

Multi-Agent Runtime Architecture

Synthesis

Recasts "which language wins" as "which system design preserves throughput." Python and Rust near-parity on throughput; architecture and concurrency strategy drive larger differences. Dual Ollama achieves 99.4% multi-agent efficiency.

Target:Systems venue

Evidence:TR112 TR114 TR115

Serving Stacks, Continuous Batching & the Physics of Throughput

Synthesis

LLM serving-stack differences are mechanistic, not benchmark trivia. Continuous batching drives throughput scaling with agent count via 77-80% kernel reduction. Backends differ in effective serial fraction. 2.25× throughput gain at N=8 explained mechanistically.

Target:Systems venue

Evidence:TR129 TR130 TR132

KV-Cache Quantization and Safety

In preparation

KV-cache quantization is a serving-layer perturbation that touches retained attention state. 5-phase paired study on FP16 vs FP8 across 24K records, 3 models. Headline result is a null: no Holm-significant safety effect detectable at α=0.05, 80% power. Operational rule: workload-specific paired eval, not pre-approval.

Target:Workshop submission

Evidence:TR145

5 papers submitted to NeurIPS 2026 · 6 in preparation

NeurIPS 2026 — Submitted · Under Review

Compile-Stack Attribution

Quality-Safety Correlation Under Quantization

Many-Shot Jailbreak Under Quantization

Speculative Decoding Safety — Null Result

Multi-Turn Jailbreak × Quantization

In Preparation

Batch Inference Safety Under Non-Determinism

Inference Optimization Is Not Safety-Neutral

Empirical Capacity Planning for Local LLM Inference

Multi-Agent Runtime Architecture

Serving Stacks, Continuous Batching & the Physics of Throughput

KV-Cache Quantization and Safety

Research Archive

Work

Platform Architecture