Independent Research

Edge LLM Inference Under Real-World Constraints

How fast can local inference get — and how safe is it at the edge? This research program answers both questions with CUDA event timing and controlled safety evaluations across model loading, quantization, TensorRT compilation, KV cache optimization, multi-agent coordination, and cross-backend safety consistency.

Independent research by Sahil Kadadekar

841,000+
Research Measurements
48
Technical Reports
5
Synthesis Whitepapers
9
Repositories

Start Here

Whitepaper

Chimeraforge: High-Performance LLM Agent Orchestration

Rust vs. Python for production AI orchestration. A hybrid architecture and "Dual Ollama" pattern achieve 58% latency reduction and near-zero contention.

Read the Whitepaper

Key Findings

Concrete results pulled from the published reports. Numbers, not narrative.

100% ASR

Q2_K is universally unacceptable for safety. Banned across 18+ models, 10+ families.

p = 0.942

Alignment type does not predict batch-induced safety fragility (RLHF, SFT, DPO, distilled — none differ).

25pp

Backend migration can cost 25 percentage points of safety. Chat template divergence, not the framework.

13.9×

Quality metrics are not safety proxies. Safety degrades 13.9× faster than quality at Q3_K_S.

99.4%

Dual Ollama eliminates 99% of multi-agent contention. Architectural fix, not code fix.

+74%

GPU memory bandwidth is the multi-agent bottleneck — not the serving stack. Overturned the TR130 conclusion.

2.25×

Continuous batching delivers 2.25× throughput at N=8 via 77-80% kernel reduction.

Q4_K_M

The universal quantization sweet spot. -4.1pp accuracy max across 5 models, 30-67% cost savings.

NULL

FP8 KV-cache produces no Holm-significant safety effect across 24K paired records on 3 models. Not pre-approved, not pre-banned — workload-specific paired eval required.

κ = 0.69

Cross-LLM judge agreement is "triangulate" — single-judge labels are insufficient for safety classification. 68K judge rows over the TR145 safety subset. Plus: safety-specialist judges measure a different axis than general LLMs.

Whitepapers

Executive-level decision documents. Start here if you need the bottom line.

Conclusive Reports & Appendices

Dissertation-style synthesis documents consolidating findings across multiple technical reports.

Technical Reports

Individual research reports with raw data, methodology, and findings.

FP8 KV-cache safety on standardized batteries and across the serving-state factorial — batch, prefix-caching, speculative decoding, and temperature.