Attention Mechanisms That Actually Matter: From Multi-Head to PagedAttention
A deep dive into four attention mechanisms that power modern LLMs, from the original transformer to the serving tricks that make inference feasible at scale.
Read moreA deep dive into four attention mechanisms that power modern LLMs, from the original transformer to the serving tricks that make inference feasible at scale.
Read moreA 20 year old paper turned an intractable search problem into a hash table lookup. Fourier transforms, constellation maps, and the most elegant algorithm I have ever encountered.
Read moreWhat it means to harness a star's full output, why Dyson Spheres are the defining structure of a Type II civilization, and where humanity sits on the Kardashev Scale right now.
Read moreFour generations of NVIDIA GPUs, from the perspective of someone who actually writes the kernels that run on them. What the specs mean when you are profiling at 2am.
Read moreWhy civilizational scale engineering challenges require not one brilliant mind but thousands working simultaneously, and why that means superhuman AI is not optional.
Read moreThe story behind gpucheck, a pytest plugin for GPU kernels. 511 test configs, 8 real bugs found in Triton and PyTorch, and why the GPU community deserves better testing tools.
Read more