12 posts in total
2026
vLLM Platform System
How the KV Cache Works in HuggingFace Transformers
Local CUDA vLLM Setup for Python-Only Development Using a Precompiled Wheel
vLLM Internals — PagedAttention and Custom Accelerator Compilation
Hugging Face Model Repositories: Organization, Semantics, and Portability
Exporting Compute Graphs, LLM Shape Dynamics, and Serving Runtimes
Schedules in Machine Learning Computation: What They Are and Who Needs to Know About Them
Main Takeaways from a Group Discussion on AI Coding
PyTorch + CUDA vs. XLA + TPU: Two Execution Models for ML Systems
Qt, OpenCV, PyTorch: The Central Dogma of GUI CV Applications