13 posts in total
2026
Compile NEFF Executables from NKI Kernels
vLLM Internals — PagedAttention and Custom Accelerator Compilation
Hugging Face Model Repositories: Organization, Semantics, and Portability
Schedules in Machine Learning Computation: What They Are and Who Needs to Know About Them
Exporting Compute Graphs, LLM Shape Dynamics, and Serving Runtimes
Learning MLIR and HLO by Building a Tiny StableHLO-to-LLVM IR Compiler
Jie Liu's B-Exam: Abstractions and Optimizations for Sparse Tensor Computation on Modern Hardware
PyTorch + CUDA vs. XLA + TPU: Two Execution Models for ML Systems
Qt, OpenCV, PyTorch: The Central Dogma of GUI CV Applications
2025
Running Local LLMs with Ollama