Skip to content

Commit

Permalink
Freeze rel (#2863)
Browse files Browse the repository at this point in the history
* freeze requriements and docker image

* fixes
  • Loading branch information
dtrawins authored Nov 20, 2024
1 parent dbe6bb0 commit 3c284cf
Show file tree
Hide file tree
Showing 5 changed files with 123 additions and 40 deletions.
111 changes: 101 additions & 10 deletions demos/common/export_models/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,102 @@
--extra-index-url "https://download.pytorch.org/whl/cpu"
--extra-index-url "https://storage.openvinotoolkit.org/simple/wheels/nightly"
--pre
optimum-intel@git+https://github.com/huggingface/optimum-intel.git
openvino-tokenizers[transformers]==2024.5.*
openvino==2024.5.*
nncf>=2.11.0
sentence_transformers==3.1.1
openai
transformers<4.45
einops
about-time==4.2.1
aiohappyeyeballs==2.4.3
aiohttp==3.11.6
aiosignal==1.3.1
alive-progress==3.2.0
annotated-types==0.7.0
anyio==4.6.2.post1
async-timeout==5.0.1
attrs==24.2.0
autograd==1.7.0
certifi==2024.8.30
charset-normalizer==3.4.0
cma==3.2.2
coloredlogs==15.0.1
contourpy==1.3.1
cycler==0.12.1
datasets==3.1.0
Deprecated==1.2.15
dill==0.3.8
distro==1.9.0
einops==0.8.0
exceptiongroup==1.2.2
filelock==3.16.1
fonttools==4.55.0
frozenlist==1.5.0
fsspec==2024.9.0
grapheme==0.6.0
h11==0.14.0
httpcore==1.0.7
httpx==0.27.2
huggingface-hub==0.26.2
humanfriendly==10.0
idna==3.10
Jinja2==3.1.4
jiter==0.7.1
joblib==1.4.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jstyleson==0.0.2
kiwisolver==1.4.7
markdown-it-py==3.0.0
MarkupSafe==3.0.2
matplotlib==3.9.2
mdurl==0.1.2
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
natsort==8.4.0
networkx==3.3
ninja==1.11.1.1
nncf==2.13.0
numpy==1.26.4
onnx==1.17.0
openai==1.54.5
openvino==2024.5.0
openvino-telemetry==2024.5.0
openvino-tokenizers==2024.5.0.0
optimum==1.23.3
optimum-intel @ git+https://github.com/huggingface/optimum-intel.git@e3031f058fff4763a9fd917464e26aab9994449f
packaging==24.2
pandas==2.2.3
pillow==11.0.0
propcache==0.2.0
protobuf==5.28.3
psutil==6.1.0
pyarrow==18.0.0
pydantic==2.9.2
pydantic_core==2.23.4
pydot==2.0.0
Pygments==2.18.0
pymoo==0.6.1.3
pyparsing==3.2.0
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
referencing==0.35.1
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.21.0
safetensors==0.4.5
scikit-learn==1.5.2
scipy==1.14.1
sentence-transformers==3.1.1
sentencepiece==0.2.0
six==1.16.0
sniffio==1.3.1
sympy==1.13.1
tabulate==0.9.0
threadpoolctl==3.5.0
tiktoken==0.8.0
tokenizers==0.19.1
torch==2.5.1+cpu
tqdm==4.67.0
transformers==4.44.2
typing_extensions==4.12.2
tzdata==2024.2
urllib3==2.2.3
wrapt==1.16.0
xxhash==3.5.0
yarl==1.17.2
18 changes: 8 additions & 10 deletions demos/continuous_batching/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,14 @@ That makes it easy to use and efficient especially on on Intel® Xeon® processo
## Get the docker image

Build the image from source to try the latest enhancements in this feature.
Pull the image from Dockerhub with CPU support:
```bash
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server
make release_image GPU=1
docker pull openvino/model_server:2024.5
```
or if you want to include also the support for GPU execution:
```bash
docker pull openvino/model_server:2024.5-gpu
```
It will create an image called `openvino/model_server:latest`.
> **Note:** This operation might take 40min or more depending on your build host.
> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
> **Note:** The public image from the last release might be not compatible with models exported using the the latest export script. Check the [demo version from the last release](https://github.com/openvinotoolkit/model_server/tree/releases/2024/4/demos/continuous_batching) to use the public docker image.

## Model preparation
> **Note** Python 3.9 or higher is need for that step
Expand Down Expand Up @@ -69,14 +67,14 @@ Check the [LLM calculator documentation](../../docs/llm/reference.md) to learn a

Running this command starts the container with CPU only target device:
```bash
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --rest_port 8000 --config_path /workspace/config.json
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5 --rest_port 8000 --config_path /workspace/config.json
```
### GPU

In case you want to use GPU device to run the generation, add extra docker parameters `--device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1)`
to `docker run` command, use the image with GPU support. Export the models with precision matching the GPU capacity and adjust pipeline configuration.
It can be applied using the commands below:
```
```bash
python demos/common/export_models/export_model.py text_generation --source_model meta-llama/Meta-Llama-3-8B-Instruct --weight-format int4 --target_device GPU --cache_size 2 --config_file_path models/config.json --model_repository_path models --overwrite_models

docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:latest-gpu --rest_port 8000 --config_path /workspace/config.json
Expand Down
18 changes: 8 additions & 10 deletions demos/embeddings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,13 @@ Text generation use case is exposed via OpenAI API `embeddings` endpoint.

## Get the docker image

Build the image from source to try this new feature. It will be included in the public image in the coming version 2024.5.
Pull the image from Dockerhub with CPU support:
```bash
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server
make release_image GPU=1
docker pull openvino/model_server:2024.5
```
It will create an image called `openvino/model_server:latest`.
> **Note:** This operation might take 40min or more depending on your build host.
> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
or if you want to include also the support for GPU execution:
```bash
docker pull openvino/model_server:2024.5-gpu

## Model preparation
> **Note** Python 3.9 or higher is needed for that step
Expand Down Expand Up @@ -77,17 +75,17 @@ All models supported by [optimum-intel](https://github.com/huggingface/optimum-i
### CPU

```bash
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --port 9000 --rest_port 8000 --config_path /workspace/config.json
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5 --port 9000 --rest_port 8000 --config_path /workspace/config.json
```
### GPU

In case you want to use GPU device to run the embeddings model, add extra docker parameters `--device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1)`
to `docker run` command, use the image with GPU support and make sure set the target_device in subconfig.json to GPU. Also make sure the export model quantization level and cache size fit to the GPU memory. All of that can be applied with the commands:

```
```bash
python demos/common/export_models/export_model.py embeddings --source_model Alibaba-NLP/gte-large-en-v1.5 --weight-format int8 --target_device GPU --config_file_path models/config.json --model_repository_path models
docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:latest-gpu --rest_port 8000 --config_path /workspace/config.json
docker run -d --rm -p 8000:8000 --device /dev/dri --group-add=$(stat -c "%g" /dev/dri/render* | head -n 1) -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5-gpu --rest_port 8000 --config_path /workspace/config.json
```
### Check readiness

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
--extra-index-url "https://download.pytorch.org/whl/cpu"
--extra-index-url "https://storage.openvinotoolkit.org/simple/wheels/nightly"
--pre
openvino==2024.5.*
numpy<2.0
transformers==4.40.2
Expand Down
14 changes: 6 additions & 8 deletions demos/rerank/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,13 @@

## Get the docker image

Build the image from source to try this new feature. It will be included in the public image in the coming version 2024.5.
Pull the image from Dockerhub with CPU support:
```bash
git clone https://github.com/openvinotoolkit/model_server.git
cd model_server
make release_image GPU=1
docker pull openvino/model_server:2024.5
```
It will create an image called `openvino/model_server:latest`.
> **Note:** This operation might take 40min or more depending on your build host.
> **Note:** `GPU` parameter in image build command is needed to include dependencies for GPU device.
or if you want to include also the support for GPU execution:
```bash
docker pull openvino/model_server:2024.5-gpu

## Model preparation
> **Note** Python 3.9 or higher is needed for that step
Expand Down Expand Up @@ -53,7 +51,7 @@ models
## Deployment

```bash
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:latest --port 9000 --rest_port 8000 --config_path /workspace/config.json
docker run -d --rm -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server:2024.5 --port 9000 --rest_port 8000 --config_path /workspace/config.json
```

Readiness of the model can be reported with a simple curl command.
Expand Down

0 comments on commit 3c284cf

Please sign in to comment.