Tags - machine-learning - Jifeng Wu's Personal Website

05-19

Local CUDA vLLM Setup for Python-Only Development Using a Precompiled Wheel

05-14

Compile NEFF Executables from NKI Kernels

05-12

vLLM Internals — PagedAttention and Custom Accelerator Compilation

05-04

Hugging Face Model Repositories: Organization, Semantics, and Portability

05-03

Exporting Compute Graphs, LLM Shape Dynamics, and Serving Runtimes

05-03

Schedules in Machine Learning Computation: What They Are and Who Needs to Know About Them

04-26

Learning MLIR and HLO by Building a Tiny StableHLO-to-LLVM IR Compiler

04-17

Jie Liu's B-Exam: Abstractions and Optimizations for Sparse Tensor Computation on Modern Hardware

04-10

PyTorch + CUDA vs. XLA + TPU: Two Execution Models for ML Systems

01-11

Qt, OpenCV, PyTorch: The Central Dogma of GUI CV Applications