Skip to content

Commit

Permalink
[Doc] Improve installation signposting (#12575)
Browse files Browse the repository at this point in the history
- Make device tab names more explicit
- Add comprehensive list of devices to
https://docs.vllm.ai/en/latest/getting_started/installation/index.html
- Add `attention` blocks to the intro of all devices that don't have
pre-built wheels/images

---------

Signed-off-by: Harry Mellor <[email protected]>
  • Loading branch information
hmellor authored Jan 31, 2025
1 parent fc54214 commit 60808bd
Show file tree
Hide file tree
Showing 13 changed files with 111 additions and 59 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

This tab provides instructions on running vLLM with Intel Gaudi devices.

:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::

## Requirements

- OS: Ubuntu 22.04 LTS
Expand Down
33 changes: 17 additions & 16 deletions docs/source/getting_started/installation/ai_accelerator/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:selected:
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -25,7 +26,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand All @@ -52,7 +53,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -72,7 +73,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand All @@ -99,7 +100,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -119,7 +120,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand All @@ -146,7 +147,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -166,7 +167,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand All @@ -193,7 +194,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -213,7 +214,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand Down Expand Up @@ -242,7 +243,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -262,7 +263,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand All @@ -289,7 +290,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -309,7 +310,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand All @@ -336,7 +337,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set}
:sync-group: device

::::{tab-item} TPU
::::{tab-item} Google TPU
:sync: tpu

:::{include} tpu.inc.md
Expand All @@ -354,7 +355,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you

::::

::::{tab-item} Neuron
::::{tab-item} AWS Neuron
:sync: neuron

:::{include} neuron.inc.md
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Infere
Paged Attention and Chunked Prefill are currently in development and will be available soon.
Data types currently supported in Neuron SDK are FP16 and BF16.

:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::

## Requirements

- OS: Linux
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

vLLM powered by OpenVINO supports all LLM models from [vLLM supported models list](#supported-models) and can perform optimal model serving on all x86-64 CPUs with, at least, AVX2 support, as well as on both integrated and discrete Intel® GPUs ([the list of supported GPUs](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html#gpu)).

:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::

## Requirements

- OS: Linux
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@ For TPU pricing information, see [Cloud TPU pricing](https://cloud.google.com/tp
You may need additional persistent storage for your TPU VMs. For more
information, see [Storage options for Cloud TPU data](https://cloud.devsite.corp.google.com/tpu/docs/storage-options).

:::{attention}
There are no pre-built wheels for this device, so you must either use the pre-built Docker image or build vLLM from source.
:::

## Requirements

- Google Cloud TPU VM
Expand Down
4 changes: 4 additions & 0 deletions docs/source/getting_started/installation/cpu/apple.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ vLLM has experimental support for macOS with Apple silicon. For now, users shall

Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.

:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::

## Requirements

- OS: `macOS Sonoma` or later
Expand Down
4 changes: 4 additions & 0 deletions docs/source/getting_started/installation/cpu/arm.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CP

ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.

:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::

## Requirements

- OS: Linux
Expand Down
13 changes: 7 additions & 6 deletions docs/source/getting_started/installation/cpu/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ vLLM is a Python library that supports the following CPU variants. Select your C
:::::{tab-set}
:sync-group: device

::::{tab-item} x86
::::{tab-item} Intel/AMD x86
:selected:
:sync: x86

:::{include} x86.inc.md
Expand All @@ -15,7 +16,7 @@ vLLM is a Python library that supports the following CPU variants. Select your C

::::

::::{tab-item} ARM
::::{tab-item} ARM AArch64
:sync: arm

:::{include} arm.inc.md
Expand Down Expand Up @@ -44,7 +45,7 @@ vLLM is a Python library that supports the following CPU variants. Select your C
:::::{tab-set}
:sync-group: device

::::{tab-item} x86
::::{tab-item} Intel/AMD x86
:sync: x86

:::{include} x86.inc.md
Expand All @@ -54,7 +55,7 @@ vLLM is a Python library that supports the following CPU variants. Select your C

::::

::::{tab-item} ARM
::::{tab-item} ARM AArch64
:sync: arm

:::{include} arm.inc.md
Expand Down Expand Up @@ -92,7 +93,7 @@ Currently, there are no pre-built CPU wheels.
:::::{tab-set}
:sync-group: device

::::{tab-item} x86
::::{tab-item} Intel/AMD x86
:sync: x86

:::{include} x86.inc.md
Expand All @@ -102,7 +103,7 @@ Currently, there are no pre-built CPU wheels.

::::

::::{tab-item} ARM
::::{tab-item} ARM AArch64
:sync: arm

:::{include} arm.inc.md
Expand Down
12 changes: 8 additions & 4 deletions docs/source/getting_started/installation/cpu/x86.inc.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,20 @@

vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.

:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::

## Requirements

- OS: Linux
- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
- Instruction Set Architecture (ISA): AVX512 (optional, recommended)

:::{tip}
[Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware.
:::

## Set up using Python

### Pre-built wheels
Expand All @@ -29,7 +37,3 @@ vLLM initially supports basic model inferencing and serving on x86 CPU platform,
### Build image from source

## Extra information

## Intel Extension for PyTorch

- [Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware.
Loading

0 comments on commit 60808bd

Please sign in to comment.