Independent Research

Edge LLM Inference Under Real-World Constraints

How fast can local inference get — and how safe is it at the edge? This research program answers both questions with CUDA event timing and controlled safety evaluations across model loading, quantization, TensorRT compilation, KV cache optimization, multi-agent coordination, and cross-backend safety consistency.

Independent research by Sahil Kadadekar

555,000+
Research Measurements
53
Technical Reports
8
Repositories
<100ms
Inference Target

Start Here

Whitepaper

Chimeraforge: High-Performance LLM Agent Orchestration

Rust vs. Python for production AI orchestration. A hybrid architecture and "Dual Ollama" pattern achieve 58% latency reduction and near-zero contention.

Read the Whitepaper

Whitepapers

Executive-level decision documents. Start here if you need the bottom line.

Conclusive Reports & Appendices

Dissertation-style synthesis documents consolidating findings across multiple technical reports.

Technical Reports

Individual research reports with raw data, methodology, and findings.

Alignment under quantization, batch perturbation, multi-turn jailbreaks, many-shot attacks, cross-architecture fragility, cross-request composition.

TR134: Alignment Robustness Under Quantization

Multi-family safety evaluation across 4 models (1.2B–7.6B) with jailbreak amplification analysis.

View Report
TR135: Safety Under Multi-Agent Concurrency

Does running N concurrent agents on a shared backend degrade model safety?

View Report
TR136: Cross-Backend Safety Consistency

Ollama vs vLLM vs TGI safety comparison across 3 models, 4 backends, and 6 benchmarks.

View Report
TR137: The Safety Tax of Inference Optimization

Unified synthesis of quantization, concurrency, and backend effects on LLM safety — 74,254 samples.

View Report
TR138: Batch Inference Safety Under Non-Determinism

Audit-layer flip adjudication and 7,257-sample replication with corrected refusal detector.

View Report
TR139: Multi-Turn Jailbreak Susceptibility Under Quantization

Conversational attack sweep — 10,600 conversations across 4 models, 6 quant levels, 8 attack strategies.

View Report
TR140: Many-Shot & Long-Context Jailbreak Under Quantization

15,000 scored samples across 4 models, 6 quant levels, 5 shot counts, and 3 context-length profiles.

View Report
TR141: Cross-Architecture Refusal Fragility Under Batch Perturbation

127,224 records across 18 models, 10+ families, 4 alignment types — batch-induced safety flip asymmetry on Blackwell GPU.

View Report
TR142: Quality-Safety Correlation Under Quantization

Cross-referencing TR125 quality metrics with TR134 safety metrics — analysis-only, no new experiments.

View Report
TR143: Cross-Request Safety Leakage Under Continuous Batching

14,250 records — batch composition effects on safety in multi-tenant vLLM inference.

View Report