Categories - AI and Machine Learning - Jifeng Wu's Personal Website

06-07

vLLM Platform System

05-31

How the KV Cache Works in HuggingFace Transformers

05-19

Local CUDA vLLM Setup for Python-Only Development Using a Precompiled Wheel

05-12

vLLM Internals — PagedAttention and Custom Accelerator Compilation

05-04

Hugging Face Model Repositories: Organization, Semantics, and Portability

05-03

Exporting Compute Graphs, LLM Shape Dynamics, and Serving Runtimes

05-03

Schedules in Machine Learning Computation: What They Are and Who Needs to Know About Them

04-10

Main Takeaways from a Group Discussion on AI Coding

04-10

PyTorch + CUDA vs. XLA + TPU: Two Execution Models for ML Systems

01-11

Qt, OpenCV, PyTorch: The Central Dogma of GUI CV Applications