Local CUDA vLLM Setup for Python-Only Development Using a Precompiled Wheel
This guide shows how to:
- run
vllmdirectly from a local clone - make Python-only code changes in the repo
- use precompiled native libraries from a nightly wheel
This is useful when you want:
- editable Python code from your clone
- working native extensions (
vllm._C, etc.) - no full source compilation
1. Create and activate a virtual environment
From the repo root:
1 | |
2. Inspect available nightly wheel variants
Check the nightly index:
1 | |
Example output:
1 | |
Notes:
cu129= CUDA 12.9 wheel variantcu130= CUDA 13.0 wheel variant- the HTML comment contains the latest built nightly commit
3. Pick a CUDA variant
For a server with CUDA 13.0 support, use:
1 | |
To inspect that variant:
1 | |
Example output:
1 | |
Choose the entry matching your machine.
For a typical Linux x86_64 server, select:
1 | |
Python tag note
You may see:
1 | |
That is okay for Python 3.12 because the wheel uses the stable ABI.
4. Download the wheel file manually
Using the x86_64 entry above:
1 | |
5. Sanity-check the wheel contents
1 | |
You should see native libraries such as:
1 | |
If you see a BrokenPipeError at the end because of | head, that is harmless.
6. Install vLLM editable using the exact wheel file
Set the environment variables:
1 | |
Then install:
1 | |
Why use VLLM_PRECOMPILED_WHEEL_LOCATION?
- it bypasses flaky automatic wheel selection logic
- it forces the build to use the exact wheel you downloaded
- it still installs the package in editable mode from your local repo
7. Verify that Python loads from your clone
1 | |
Expected result:
1 | |
This confirms that Python code is coming from your local clone.
8. Verify that the native extension loads
1 | |
Expected output:
1 | |
This confirms that the precompiled native library is usable.
9. Verify the CLI works
1 | |
If this works, the setup is ready.
10. Run the server
Example:
1 | |
In another shell, test it:
1 | |
11. Development workflow
If you modify Python files under the repo, for example:
vllm/entrypoints/...vllm/engine/...vllm/core/...- other Python modules
then restart the process and your changes should be picked up.
You do not need to reinstall for normal Python-only changes.