forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Hardware] Initial TPU integration (vllm-project#5292)
- Loading branch information
1 parent
847cdcc
commit 1a8bfd9
Showing
22 changed files
with
1,322 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
ARG NIGHTLY_DATE="20240601" | ||
ARG BASE_IMAGE="us-central1-docker.pkg.dev/tpu-pytorch-releases/docker/xla:nightly_3.10_tpuvm_$NIGHTLY_DATE" | ||
|
||
FROM $BASE_IMAGE | ||
|
||
WORKDIR /workspace | ||
COPY . /workspace/vllm | ||
|
||
ENV VLLM_TARGET_DEVICE="tpu" | ||
# Install aiohttp separately to avoid build errors. | ||
RUN pip install aiohttp | ||
# Install the TPU and Pallas dependencies. | ||
RUN pip install torch_xla[tpu] -f https://storage.googleapis.com/libtpu-releases/index.html | ||
RUN pip install torch_xla[pallas] -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html | ||
|
||
# Build vLLM. | ||
RUN cd /workspace/vllm && python setup.py develop | ||
|
||
CMD ["/bin/bash"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
.. _installation_tpu: | ||
|
||
Installation with TPU | ||
===================== | ||
|
||
vLLM supports Google Cloud TPUs using PyTorch XLA. | ||
|
||
Requirements | ||
------------ | ||
|
||
* Google Cloud TPU VM (single host) | ||
* TPU versions: v5e, v5p, v4 | ||
* Python: 3.10 | ||
|
||
Installation options: | ||
|
||
1. :ref:`Build a docker image with Dockerfile <build_docker_tpu>`. | ||
2. :ref:`Build from source <build_from_source_tpu>`. | ||
|
||
.. _build_docker_tpu: | ||
|
||
Build a docker image with :code:`Dockerfile.tpu` | ||
------------------------------------------------ | ||
|
||
`Dockerfile.tpu <https://github.com/vllm-project/vllm/blob/main/Dockerfile.tpu>`_ is provided to build a docker image with TPU support. | ||
|
||
.. code-block:: console | ||
$ docker build -f Dockerfile.tpu -t vllm-tpu . | ||
You can run the docker image with the following command: | ||
|
||
.. code-block:: console | ||
$ # Make sure to add `--privileged --net host --shm-size=16G`. | ||
$ docker run --privileged --net host --shm-size=16G -it vllm-tpu | ||
.. _build_from_source_tpu: | ||
|
||
Build from source | ||
----------------- | ||
|
||
You can also build and install the TPU backend from source. | ||
|
||
First, install the dependencies: | ||
|
||
.. code-block:: console | ||
$ # (Recommended) Create a new conda environment. | ||
$ conda create -n myenv python=3.10 -y | ||
$ conda activate myenv | ||
$ # Clean up the existing torch and torch-xla packages. | ||
$ pip uninstall torch torch-xla -y | ||
$ # Install PyTorch and PyTorch XLA. | ||
$ export DATE="+20240601" | ||
$ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch-nightly${DATE}-cp310-cp310-linux_x86_64.whl | ||
$ pip install https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-nightly${DATE}-cp310-cp310-linux_x86_64.whl | ||
$ # Install JAX and Pallas. | ||
$ pip install torch_xla[tpu] -f https://storage.googleapis.com/libtpu-releases/index.html | ||
$ pip install torch_xla[pallas] -f https://storage.googleapis.com/jax-releases/jax_nightly_releases.html -f https://storage.googleapis.com/jax-releases/jaxlib_nightly_releases.html | ||
$ # Install other build dependencies. | ||
$ pip install packaging aiohttp | ||
Next, build vLLM from source. This will only take a few seconds: | ||
|
||
.. code-block:: console | ||
$ VLLM_TARGET_DEVICE="tpu" python setup.py develop |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Common dependencies | ||
-r requirements-common.txt | ||
|
||
# Dependencies for TPU | ||
# Currently, the TPU backend uses a nightly version of PyTorch XLA. | ||
# You can install the dependencies in Dockerfile.tpu. | ||
triton # To avoid import errors |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.