forked from InternLM/lmdeploy
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Conflicts: lmdeploy/serve/async_engine.py
- Loading branch information
Showing
44 changed files
with
1,482 additions
and
146 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
# VLM Offline Inference Pipeline | ||
|
||
LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the the Large Language Model (LLM) inference [pipeline](./pipeline.md). | ||
In this article, we will take the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as an example, exhibiting the powerful capabilities of the VLM pipeline through various examples. | ||
First, we will demonstrate the most basic utilization of the pipeline and progressively unveil additional functionalities by configuring the engine parameters and generation arguments, such as tensor parallelism, setting context window size, and random sampling, customizing chat template and so on. Next, we will provide inference examples for scenarios involving multiple images, batch prompts etc. | ||
|
||
## A 'Hello, world' example | ||
|
||
```python | ||
from lmdeploy import pipeline | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b') | ||
|
||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image)) | ||
print(response) | ||
``` | ||
|
||
If `ImportError` occurs while executing this case, please install the required dependency packages as prompted. | ||
|
||
In the above example, the inference prompt is a tuple structure consisting of (prompt, image). Besides this structure, the pipeline also supports prompts in the OpenAI format: | ||
|
||
```python | ||
from lmdeploy import pipeline | ||
|
||
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b') | ||
|
||
prompts = [ | ||
{ | ||
'role': 'user', | ||
'content': [ | ||
{'type': 'text', 'text': 'describe this image'}, | ||
{'type': 'image_url', 'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'}} | ||
] | ||
} | ||
] | ||
response = pipe(prompts) | ||
print(response) | ||
``` | ||
|
||
### Set tensor parallelism | ||
|
||
Tensor paramllelism can be activated by setting the engine parameter `tp` | ||
|
||
```python | ||
from lmdeploy import pipeline, TurbomindEngineConfig | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b', | ||
backend_config=TurbomindEngineConfig(tp=2)) | ||
|
||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image)) | ||
print(response) | ||
``` | ||
|
||
### Set context window size | ||
|
||
When creating the pipeline, you can customize the size of the context window by setting the engine parameter `session_len`. | ||
|
||
```python | ||
from lmdeploy import pipeline, TurbomindEngineConfig | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b', | ||
backend_config=TurbomindEngineConfig(session_len=8192)) | ||
|
||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image)) | ||
print(response) | ||
``` | ||
|
||
### Set sampling parameters | ||
|
||
You can change the default sampling parameters of pipeline by passing `GenerationConfig` | ||
|
||
```python | ||
from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b', | ||
backend_config=TurbomindEngineConfig(tp=2, session_len=8192)) | ||
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.6) | ||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image), gen_config=gen_config) | ||
print(response) | ||
``` | ||
|
||
### Set chat template | ||
|
||
While performing inference, LMDeploy identifies an appropriate chat template from its builtin collection based on the model path and subsequently applies this template to the input prompts. However, when a chat template cannot be told from its model path, users have to specify it. For example, liuhaotian/llava-v1.5-7b employs the 'vicuna' chat template, but the name 'vicuna' cannot be ascertained from the model's path. We can specify it by setting 'vicuna' to `ChatTemplateConfig` as follows: | ||
|
||
```python | ||
from lmdeploy import pipeline, ChatTemplateConfig | ||
from lmdeploy.vl import load_image | ||
pipe = pipeline('liuhaotian/llava-v1.5-7b', | ||
chat_template_config=ChatTemplateConfig(model_name='vicuna')) | ||
|
||
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') | ||
response = pipe(('describe this image', image)) | ||
print(response) | ||
``` | ||
|
||
For more information about customizing a chat template, please refer to [this](../advance/chat_template.md) guide | ||
|
||
## Multi-images inference | ||
|
||
When dealing with multiple images, you can put them all in one list. Keep in mind that multiple images will lead to a higher number of input tokens, and as a result, the size of the [context window](#set-context-window-size) typically needs to be increased. | ||
|
||
```python | ||
from lmdeploy import pipeline, TurbomindEngineConfig | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b', | ||
backend_config=TurbomindEngineConfig(session_len=8192)) | ||
|
||
image_urls=[ | ||
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg', | ||
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg' | ||
] | ||
|
||
images = [load_image(img_url) for img_url in image_urls] | ||
response = pipe(('describe these images', images)) | ||
print(response) | ||
``` | ||
|
||
## Batch prompts inference | ||
|
||
Conducting inference with batch prompts is quite straightforward; just place them within a list structure: | ||
|
||
```python | ||
from lmdeploy import pipeline, TurbomindEngineConfig | ||
from lmdeploy.vl import load_image | ||
|
||
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b', | ||
backend_config=TurbomindEngineConfig(session_len=8192)) | ||
|
||
image_urls=[ | ||
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg", | ||
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/det.jpg" | ||
] | ||
prompts = [('describe this image', load_image(img_url)) for img_url in image_urls] | ||
response = pipe(prompts) | ||
print(response) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.