I am S Akash, a senior pursuing Electrical and Electronics Engineering at the Indian Institute of Technology Patna. I am interested in research involving GPU computing and faster ML Inference techniques.
A significant portion of my recent work has been through Google Summer of Code (CERN-HSF), contributing GPU-accelerated inference to TMVA SOFIE within ROOT. I collaborated closely with Sanjiban Sengupta and Lorenzo Moneta on GPU backends (CUDA, ROCm, Alpaka) and fast, minimal-dependency C++ code generation for ML inference.
During my visit to ShanHaiWoo (Singapore), co-hosted with Ethereum Singapore Week 2025 I built FlowLink, “Crypto Payments You Can Trust”. Our team was selected as a top‑5 winner at the Ethereum Singapore, and we were invited to present during TOKEN2049 Week at the ShanHaiWoo Winners’ Showcase.
Worked with Martin Kjeldsen at Unit of Measure on multimodal embeddings and large‑scale product retrieval/deduplication along with sharded vector stores to serve millions of SKUs with low latency. Explored RAG Orchestration while keeping performance vs latency vs cost in consideration.
With Dr Sriparna Saha, I explored insight re‑ranking using LLMs via Proximal Policy Optimization (PPO) for better retrieval and quality control. In collaboration with Microsoft, our work SUMMIR: A Hallucination Aware Framework for Ranking Sports Insights from LLMs ( pre-print ) has been accepted to the main track of ECIR 2026.
I’m now investigating KV‑cache methods for long‑context inference and throughput, focusing on flash attention mechanism.
I’m a founding engineer at Autostep (YC S25), working closely with Aidan Pratt to find the repetitive, high-cost work hiding inside organizations and automate it. I mine 5000+ hours of real workflow data with clustering to surface what is genuinely worth automating, and bring GPU cluster costs down by 30%+ on AWS through compute-aware scheduling and dynamic resource allocation. More than the systems themselves, this has been my real lesson in grit and high agency: shipping every day, owning problems end-to-end without waiting for permission, and treating tight constraints as a starting point rather than a reason to stop.
Loading updates…