Work

ML Engineer · Independent Researcher · Founder

Building across inference optimization, constitutional AI architectures, and empirical safety evaluation. A paper accepted at the ICML 2026 Workshop on Hypothesis Testing, 5 more under peer review,55 technical reports, 1,348,000+ empirical measurements, and a constitutional AI ecosystem spanning 9 repositories and 5 languages.

GitHub LinkedIn Papers

Experience

Founding Machine Learning Engineer

GhostEye Inc. · New York, USA

Dec 2025 — Mar 2026

Built a multi-agent security awareness training platform in roughly 90 days across web, Slack, Teams, SMS/RCS, WhatsApp, Telegram, voice, and email. A shared JIT training agent delivered personalized training from vectorized failure history across phishing, vishing, smishing, and deepfake phishing.
Built the phishing simulation agent on self-hosted 70B LLMs using domain-specific LoRA/QLoRA adapters over full retraining, trained on a 1M+ email corpus grounded in NIST guidance, with vendor-impersonation templates and typosquatted login pages.
Reduced deepfake phishing simulation latency from 40s to 100–450ms on the production cluster — an 80–400× improvement.
Owned agent orchestration end-to-end: Azure (app registration, tenancy/auth), intelligent call routing by country code, a custom jitter algorithm for human-like timing, and 5 specialized review agents distilled from ~2,500 reviewer comments across ~1,000 pull requests.
Built LangGraph/LangSmith-traced scoring agents with input guardrails across APIs, agents, and SMS engines, with adversarial attempt logging feeding analytics, training-outcome evaluation, and SCORM-packaged compliance reporting aligned to SOC 2, NIST, ISO 27001, GDPR via Vanta.
Wrote 5,000+ tests across ~20 services, built CI/CD regression pipelines with automated documentation, and built an internal CLI with custom Claude Code skills/hooks for rapid prod/staging debugging across Porter-deployed services.

Co-Founder & Lead Machine Learning Engineer

Attunica AI · New York, USA

Oct 2025 — Present

Architected and solo-developed a four-service AI psychological safety research platform for psychotherapy training, deployed on Vercel and Fly.io, with pilot collaboration through the NYU Silberman School of Social Work.
Built a real-time streaming agent using LiveKit SDK and Google Gemini Realtime API to simulate adaptive, therapist-guided AI personas in sub-100ms WebRTC sessions.
Designed an IRB-aligned research data layer with tiered consent, NER-based anonymization, and longitudinal tracking of AI dependency and parasocial attachment metrics, enabling collection of independent human–AI interaction data for research.
Engineered a privacy-first, cost-optimized architecture (~$1.20/session) that eliminates A/V storage via transcript mirroring, preserving fault-tolerant multi-agent communication while reducing storage and compliance risk.

Founder & Lead Machine Learning Engineer

Chimera AI Ecosystem · New York, USA

Sep 2025 — Present

Architected a constitutional AI ecosystem spanning 9 repositories, 15+ services, and 5 languages (Python, Rust, TypeScript, C#, JavaScript), with production inference services, a research platform, mobile and web clients, and a content pipeline.
Designed a constitutional alignment architecture: a multi-model debate engine with heat-based escalation, 3 consensus algorithms (weighted, ranked-choice, Condorcet), and a visual-embedding fast-path router achieving 99% single-hop routing in <10ms — a 10,000× speedup over full debate at 97% cost reduction. Productionized with drift detection (KS, PSI, ADWIN), canary deployment, and auto-rollback circuit breakers.
Built the alignment runtime in Rust (7 crates): BFT consensus, Ed25519 provenance chains, Merkle tree verification, and zero-knowledge proofs (Pedersen commitments on Ristretto255). Integrated with Python ML services via zero-copy Arrow IPC FFI, achieving <20ms end-to-end P95 on the fast path.
Built JARVIS, a multi-provider AI gateway (Anthropic, OpenAI, Gemini) with chat, voice, multi-graph memory (semantic, temporal, causal, entity graphs with BFS traversal and episodic consolidation), tool execution with human-in-the-loop approval, proactive intelligence (system-originated workflows), and 5 channel adapters (Slack, Discord, Telegram, WhatsApp, Email). Shipped with a Unity/C# Android client and Next.js web console, both supporting cross-device sync and session handoff.
Wired a cognitive meta-controller with 4 agent styles (analytical, creative, adversarial, domain expert), ELO-based performance tracking, and advisory pre-checks on high-risk tool execution. Self-improving via a cross-repo MLOps loop: debate outcomes generate DPO training pairs (Rafailov loss), with Dr. GRPO, RLOO, and Self-Rewarding (WARM judge ensemble) post-training added 2026; fed through statistical drift detection (Welch's t-test, Cohen's d) and a canary deployment controller with auto-rollback for zero-downtime model promotion.
Built a 6-agent content pipeline with ClickHouse analytics and confidence-scored publishing, plus a constitutional observability system with 6 watchers, 5 triagers, and 7 autonomous fixers driven by 8 playbooks with a SQLite-backed dead letter queue (102 tests).
Extended the constitutional architecture to embodied autonomy via ProjectWyvern: a governed mission-execution plane between Chimera control and PX4/ArduPilot, with a 5-tier authority hierarchy, cryptographic mission replay, and an OpenAPI 3.1 mission contract. Phase 0 specs complete; SIM-ONLY MVP on PX4 + ROS 2 + Gazebo in progress.

Co-Founder

Stealth Startup, Medical AI · New York, USA

Oct 2023 — Aug 2025

Led a cross-functional team of 3 engineers and 1 clinician across 5 institutions (state government, city university, dental hospital, 2 engineering colleges) to build an ML-guided diagnostic platform spanning 3 clinical domains.
Designed and deployed end-to-end ML pipelines on AWS (Lambda, SageMaker, Bedrock, DynamoDB, SQS/DLQs) to process 1K+ clinical cases with reproducible throughput and scalable orchestration.
Implemented RAG-based decision-support workflows with Qdrant vector search and multimodal retrieval over imaging and clinical metadata, with ingestion and model-serving endpoints designed for HIPAA-sensitive workflows (IAM-scoped roles, encryption at rest, auditable access controls).
Reduced infrastructure costs by 75% through compute/storage optimization while preserving reproducibility and clinical workflow requirements.

Research Engineer — Medical Imaging AI

Multiple Institutions · Pune, India

Jan 2022 — Sep 2023

Engineered high-throughput training and preprocessing pipelines for 92K+ fundus scans and 1K+ dental imaging cases, achieving 93% diagnostic accuracy in clinical classification workflows.
Improved training efficiency with mixed-precision training, multi-GPU execution (4× throughput), and NVIDIA DALI-optimized data loading for large-scale imaging experiments.
Evaluated ~10 attention mechanism variants for 5-class diabetic retinopathy severity grading and produced SHAP-based interpretability reports to support clinician review of AI-assisted diagnoses.
Established dataset versioning and annotation protocols adopted by 5+ research teams, securing institutional copyright (L-122721/2023) and improving reproducibility for clinical AI research.

Education

New York University

New York, USA

M.S. in Computer Science

GPA: 3.5/4.0May 2025

Pune Institute of Computer Technology

Pune, India

B.E. in Electronics & Telecommunications

SGPA: 9.1/10.0Aug 2022

Technical Skills

Languages: Python, TypeScript, Rust, C#, SQL, C++, Java
Frameworks & AI: FastAPI, Next.js, PyTorch, TensorFlow/Keras, Transformers, Pydantic, PEFT, Ray, DeepSpeed, LangGraph, LangSmith, LiveKit
GPU & Compilation: CUDA, Triton, TensorRT, FlashAttention, ONNX Runtime, torch.compile, Nsight Systems/Compute, GPTQ, AWQ, INT4/INT8
Inference: vLLM, TGI, llama.cpp (GGUF), continuous batching, KV-cache optimization, speculative decoding
Data & Analysis: PostgreSQL, Redis, ClickHouse, MinIO, SQLAlchemy, SciPy, pandas
Cloud & Deployment: AWS (Lambda, S3, DynamoDB, SQS, IAM), Docker, Kubernetes, Celery, Vercel
Security & Auth: OAuth2, JWT, HMAC webhooks, Azure AD, RBAC
Monitoring: Prometheus, Grafana, Datadog, OpenTelemetry, pynvml, MLflow, Weights & Biases

Open Source

Chimeraforge — PyPI capacity-planning CLI

Model-agnostic 5-gate planner (v0.5.0 plans any registry / Ollama / HuggingFace model); 6 validated predictive models (VRAM R²=0.968, throughput R²=0.859) + opt-in safety gate (TR134 refusal + TR142 RTSI), dual-language harnesses (Python + Rust), 450 tests. 2,000+ downloads on PyPI.

quantfit — PyPI quantization CLI with a safety-tax check

A standalone GPU-aware quantization CLI (separate from the Chimera ecosystem). Quantizes across the SOTA matrix (AWQ / GPTQ / SmoothQuant / FP8 / RTN via llm-compressor, plus GGUF for llama.cpp / Ollama), is honest about whether a model fits the GPU (3-tier: in-GPU / CPU-offload / refuse), and uniquely measures the safety tax of the quantization it just performed — a two-axis vector (refusal-robustness + over-refusal, per zone) judged by the QuantSafe refusal classifier. v0.1.0 (Alpha), Apache-2.0.

HuggingFace — 16 model releases

11 quantized (AWQ + GPTQ 4-bit) across Llama 3.2, Qwen 2.5, Mistral 7B, Phi-2, plus 4 custom GPT-2 scaling variants from the model scaling study, plus quantsafe-refusal-modernbert (the refusal classifier behind the QuantSafe Certifier).

PyTorch PR #175562 — merged to PyTorch main

Authored an upstream PyTorch fix, merged to main (2026-06-04), replacing a hard assertion with a warning in cudagraph_trees deallocation — surfaced while diagnosing torch.compile autoregressive decode failures where growing KV-cache tensors triggered deallocation errors during compiled inference. Also validated the maintainer's in-review decode-path fix (PR #184102) on a real HF gpt2 decode and reported a multi-partition coverage gap it leaves open.

Validation gist + repro (PR #184102)

QuantSafe Certifier — signed release-gate for quantized models

A live HuggingFace Space that issues Ed25519-signed, tamper-evident release-screen records for quantized models: it runs the RTSI triage from arXiv:2606.10154 to route a configuration to clear / review / direct-safety-eval, then emits a content-addressed evidence manifest. Ships a fine-tuned refusal classifier (quantsafe-refusal-modernbert). Backyard-AI hackathon submission (OpenAI / Modal / NVIDIA sponsors).

HF blog LinkedIn X thread

Papers Research Archive Platform About