vLLM Platform System
vLLM Platform System
Overview
vLLM uses a plugin-based platform architecture that allows new hardware platforms to be supported without modifying vLLM core source code. A new platform integrates by:
- Subclassing the
Platformbase class and overriding key methods - Publishing a Python package with a detector function as a
vllm.platform_pluginsentry point
vLLM discovers and activates the platform at runtime through Python’s standard entry point system. Once activated, the vllm.platforms.current_platform platform singleton is used polymorphically by ~300 files of core vLLM code.
Python’s Entry Point System
What Are Entry Points?
Entry points are a Python packaging standard (PEP 621 / importlib.metadata) that let a package advertise named hooks. Think of it as a runtime plugin registry built into the Python packaging system.
How They Work
1. A plugin package declares entry points in its pyproject.toml:
1 | |
2. The host application discovers plugins by group name:
1 | |
vLLM’s Platform Architecture
Key Files
| File | Role |
|---|---|
vllm/platforms/interface.py |
Base Platform class (~600 lines), plus DeviceCapability, PlatformEnum |
vllm/platforms/__init__.py |
Plugin loading, auto-detection, lazy current_platform singleton |
vllm/plugins/__init__.py |
Generic entry-point plugin loader (load_plugins_by_group) |
vllm/platforms/cuda.py |
CudaPlatform — NVIDIA GPUs |
vllm/platforms/rocm.py |
RocmPlatform — AMD GPUs |
vllm/platforms/xpu.py |
XPUPlatform — Intel GPUs |
vllm/platforms/tpu.py |
TpuPlatform — Google TPUs |
vllm/platforms/cpu.py |
CpuPlatform — x86/ARM CPU inference |
vllm/platforms/zen_cpu.py |
ZenCpuPlatform — AMD Zen CPUs with zentorch |
The Platform Base Class (the Contract)
Every platform subclasses Platform and overrides any number of ~50 classmethods and properties. The base class provides sensible defaults (typically no-op or NotImplementedError) so platforms only override what they need.
The Auto-Detection Pipeline
Platform detection happens lazily the first time vllm.platforms.current_platform is accessed. The module-level __getattr__ in vllm/platforms/__init__.py triggers resolve_current_platform_cls_qualname(), which runs the following pipeline:
- Load builtin detection functions:
tpu_platform_plugin()cuda_platform_plugin()rocm_platform_plugin()xpu_platform_plugin()cpu_platform_plugin()
- Load out-of-tree (OOT) detection functions from
entry_points(group="vllm.platform_plugins").
The loaded detection functions (builtin and OOT) follow a consistent pattern: try to detect hardware presence statelessly, return the qualified class name or None:
1 | |
Then:
- Resolve exactly one platform. OOT plugins take priority over builtins:
- If exactly 1 OOT plugin activates → use it
- If 0 OOT and exactly 1 builtin activates → use it
- If 0 total → fall back to
UnspecifiedPlatform(no-op stub) - If ≥2 of either → raise
RuntimeError(ambiguous)
- Instantiate the singleton.
- Dynamically imports the resolved class and instantiates it.
- Stored in the module-level
_current_platformvariable for the process lifetime.
The vllm.platforms.current_platform Singleton
vLLM core code dispatches behavior through vllm.platforms.current_platform — it’s referenced in over 300 files. There are two main dispatch patterns:
Pattern 1: Type Checks (Inline Branching)
Used when the difference between platforms is fundamental or when only a few cases need special-casing:
1 | |
Pattern 2: Polymorphic Classmethod Calls
Used for standardized interfaces where the platform provides the entire implementation:
1 | |
These patterns serve the same goal:
Core code never writes:
1 | |
Instead, it always writes:
1 | |
This abstraction means adding a new platform type doesn’t require finding and updating every hardware-specific branch — the platform subclass provides all the answers through its overrides.
Adding a New Out-of-Tree (OOT) Platform
Here is a minimal example of adding a custom platform without touching vLLM’s source tree.
Step 1: Create a Python package
1 | |
Step 2: Write the detector function
my_platform/__init__.py:
1 | |
Step 3: Write the Platform subclass
my_platform/platform.py:
1 | |
Step 4: Register the Entry Point
pyproject.toml:
1 | |
Step 5: Install and Run
1 | |