Tags - llm - Jifeng Wu's Personal Website

06-07

vLLM Platform System

05-31

How the KV Cache Works in HuggingFace Transformers

05-19

Local CUDA vLLM Setup for Python-Only Development Using a Precompiled Wheel

05-12

vLLM Internals — PagedAttention and Custom Accelerator Compilation

05-10

Our Cognitive Profile and a Personal Playbook for the Agentic Era

05-03

Exporting Compute Graphs, LLM Shape Dynamics, and Serving Runtimes

04-17

Jie Liu's B-Exam: Abstractions and Optimizations for Sparse Tensor Computation on Modern Hardware

04-16

Critique of "RFSeek and Ye Shall Find: A tool for summary visualization and analysis of RFCs"

04-10

Main Takeaways from a Group Discussion on AI Coding

09-05

Running Local LLMs with Ollama