-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #8 from hhhhhharry/main
add inference models (open sora, sdxl, chatglm)
- Loading branch information
Showing
83 changed files
with
15,565 additions
and
0 deletions.
There are no files selected for viewing
181 changes: 181 additions & 0 deletions
181
inference_models/pytorch/aigc/text_generation/chatglm3-6b/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
## README ChatGLM3-6B | ||
|
||
# 目录 | ||
|
||
<!-- TOC --> | ||
|
||
- [目录](#目录) | ||
- [ChatGLM3-6B介绍](#chatglm3-6b介绍) | ||
- [模型文件](#模型文件) | ||
- [环境要求](#环境要求) | ||
- [模型验证运行示例](#模型验证运行示例) | ||
- [批量离线推理](#批量离线推理) | ||
- [性能测试](#性能测试) | ||
- [模型验证结果示例](#模型验证结果示例) | ||
- [批量离线推理结果示例](#批量离线推理结果示例) | ||
- [性能测试结果示例](#性能测试结果示例) | ||
|
||
<!-- /TOC --> | ||
|
||
## ChatGLM3-6B介绍 | ||
|
||
ChatGLM3-6B是一个文生文开源双语大模型,由智谱AI和清华大学KEG实验室合力研发,于2023年10月27日首次推出。ChatGLM3-6B基于transformer架构,采用了更加多样的双语数据集进行训练,并使用了全新的Prompt格式,在语义、数学、推理、代码、知识等各方面的表现均有所提升。该模型可支持的最大上下文长度为8K,vLLM已支持该模型的推理。本文档介绍在Enflame GCU上基于vLLM进行ChatGLM3-6B的推理及性能评估过程。 | ||
|
||
[论文:Zhipu AI, Tsinghua University. ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools.](https://arxiv.org/pdf/2406.12793) | ||
|
||
## 模型文件 | ||
|
||
模型权重可以通过以下链接下载: | ||
- 使用HuggingFace下载: | ||
- [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/main) | ||
- 分支:main | ||
- commit id:91a0561 | ||
- 或者使用ModelScope下载: | ||
- [ChatGLM3-6B](https://modelscope.cn/models/zhipuai/chatglm3-6b/files) | ||
- 分支:master | ||
- commit id:36fd140f | ||
|
||
下载后将所有文件放入`chatglm3-6b`文件夹 | ||
|
||
## 环境要求 | ||
|
||
软硬件需求 | ||
- OS:ubuntu 20.04 | ||
- Python:3.8 - 3.10 | ||
- 加速卡:燧原S60 | ||
|
||
推理框架安装 | ||
- ChatGLM-6B基于`vLLM`推理框架进行评估测试 | ||
- 以下步骤基于拟使用的 `Python3`版本, 请先安装对应`Python3`版本的所需依赖,需要在**docker**内安装: | ||
- 安装`vLLM`之前,请完成`TopsRider`软件栈安装,安装过程请参考《TopsRider软件栈安装手册》; | ||
- 首先通过如下命令检查`vllm`及相关依赖是否已经安装: | ||
```shell | ||
python3 -m pip list | grep vllm | ||
python3 -m pip list | grep xformers | ||
python3 -m pip list | grep tops-extension | ||
``` | ||
- 如果已经正常安装,可以显示如下内容: | ||
``` | ||
vllm <version>+gcu | ||
xformers <version> | ||
tops-extension <version> | ||
``` | ||
- 如果未安装,可以通过`TopsRider`完成`vllm`安装: | ||
```shell | ||
./Topsrider_xxx.run -y -C vllm | ||
``` | ||
|
||
依赖安装 | ||
|
||
- 进入 chatglm3-6b 根目录,执行: | ||
|
||
```bash | ||
pip3 install -r requirements.txt | ||
``` | ||
|
||
## 模型验证运行示例 | ||
|
||
您可以使用如下命令运行vllm_utils模块中的相关脚本,对模型进行测试 | ||
|
||
### 批量离线推理 | ||
|
||
示例运行命令 | ||
|
||
```bash | ||
python3 -m vllm_utils.benchmark_test \ | ||
--model=[path_of_chatglm] \ | ||
--tensor-parallel-size=1 \ | ||
--device gcu \ | ||
--demo=te \ | ||
--dtype=float16 \ | ||
--output-len=256 | ||
``` | ||
|
||
参数说明 | ||
|
||
```text | ||
使用vllm_utils.benchmark_test进行批量离线推理使用的主要参数如下: | ||
--model:模型加载路径。 | ||
--tensor-parallel-size:推理时使用的gpu数量。 | ||
--device:进行推理的设备类型。可选值为"gcu","cpu"或"cuda"。 | ||
--demo:推理使用的示例输入,以下是可选值及其代表的示例输入类型: | ||
"te": "text-english" | ||
"tc": "text-chinese" | ||
"ch": "chat" | ||
"chc": "character-chat" | ||
"cc": "code-completion" | ||
"ci": "code-infilling" | ||
"cin": "code-instruction" | ||
--dtype:模型权重以及激活层的数据类型 | ||
--output-len:每个prompt得到的输出长度 | ||
``` | ||
|
||
### 性能测试 | ||
|
||
示例运行命令 | ||
|
||
```bash | ||
python3 -m vllm_utils.benchmark_test --perf \ | ||
--model=[path_of_chatglm] \ | ||
--tensor-parallel-size=1 \ | ||
--device gcu \ | ||
--max-model-len=8192 \ | ||
--tokenizer=[path_of_chatglm] \ | ||
--input-len=128 \ | ||
--output-len=128 \ | ||
--num-prompts=64 \ | ||
--block-size=64 \ | ||
--dtype=float16 | ||
``` | ||
|
||
参数说明 | ||
|
||
```text | ||
使用vllm_utils.benchmark_test进行性能测试使用的主要参数如下: | ||
--perf:使用性能评估模式。 | ||
--model:模型加载路径。 | ||
--tensor-parallel-size:推理时使用的gpu数量。 | ||
--device:进行推理的设备类型。可选值为"gcu","cpu"或"cuda"。 | ||
--max-model-len:模型最大的文本序列长度(输入与输出均计入) | ||
--tokenizer:模型tokenizer加载路径,一般与模型路径一致 | ||
--input-len:输入的prompt长度 | ||
--output-len:每个prompt得到的输出长度 | ||
--num-prompts:输入的prompt数量 | ||
--block-size:Paged Attention的block大小 | ||
--dtype:模型权重以及激活层的数据类型 | ||
``` | ||
注: | ||
- ChatGLM3-6B支持的`max-model-len`为最大8192 | ||
- `input-len`、`output-len`和`num-prompts`可按需调整 | ||
- 配置`output-len`为1时,输出内容中的`latency`即为time_to_first_token_latency | ||
|
||
## 模型验证结果示例 | ||
|
||
### 批量离线推理结果示例 | ||
|
||
以--output-len=32为例: | ||
```text | ||
Prompt: 'Hello, my name is', Generated text: ' [Name], and I am a [Job Title] at [Company Name]. I am excited to be here today to share with you some insights on [' | ||
Prompt: 'The president of the United States is', Generated text: ' elected by the people of the United States, but the president is not chosen by popular vote. Instead, the president is chosen through a process called the Electoral' | ||
Prompt: 'The capital of France is', Generated text: ' Paris. \nThe capital of France is Paris.\n\nParis is the capital of France.\n\nParis is a city in France.' | ||
Prompt: 'The future of AI is', Generated text: ' not just about technology, but also about the ethical and social implications of its use.\nAs AI becomes more advanced, it will become increasingly important to consider the' | ||
``` | ||
|
||
### 性能测试结果示例 | ||
|
||
推理结束后会输出性能指标,自动存储在当前目录中生成的.csv文件内 | ||
|
||
```text | ||
***Perf Info*** | ||
{ | ||
"latency_num_prompts": "xxx ms", | ||
"latency_per_token": "xxx ms", | ||
"request_per_second": "xxx requests/s", | ||
"token_per_second": "xxx tokens/s", | ||
"prefill_latency_per_token": "xxx ms", | ||
"decode_latency_per_token": "xxx ms", | ||
"decode_throughput": "xxx tokens/s" | ||
} | ||
``` |
1 change: 1 addition & 0 deletions
1
inference_models/pytorch/aigc/text_generation/chatglm3-6b/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
protobuf==3.20.1 |
Empty file.
142 changes: 142 additions & 0 deletions
142
inference_models/pytorch/aigc/text_to_image/common/get_meta_info.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,142 @@ | ||
import subprocess | ||
|
||
|
||
def run_command(cmd): | ||
run = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE) | ||
result = run.stdout | ||
result_str = result.decode().strip('\n').strip() | ||
|
||
return result_str | ||
|
||
|
||
def get_device_info(device_id): | ||
sys_info = {} | ||
fields = ['Dev Name', 'Ver', 'GCU CLK', 'Mem CLK'] | ||
|
||
for field in fields: | ||
col = 3 if field == 'Ver' else 4 | ||
|
||
cmd = f"efsmi --q -i 0 | grep '{field}' | awk " + "'{print $" + str(col) + "}'" | ||
result_str = run_command(cmd) | ||
|
||
if 'CLK' in field: | ||
result_str += 'MHz' | ||
|
||
if field == 'Ver': | ||
field == 'KMD Ver' | ||
|
||
sys_info[field] = result_str | ||
|
||
return sys_info | ||
|
||
|
||
def get_deb_pkg_info(): | ||
deb_info = {} | ||
deb_names = ['topsruntime', 'tops-sdk', 'tops-inference', 'topsaten'] | ||
|
||
for deb in deb_names: | ||
cmd = f"dpkg -l | grep '{deb}' | awk " + "'{print $3}'" | ||
result_str = run_command(cmd) | ||
deb_info[deb] = result_str | ||
|
||
return deb_info | ||
|
||
|
||
def get_python_pkg_info(): | ||
python_info = {} | ||
|
||
result_str = run_command('python3 -V') | ||
python_info['python'] = result_str | ||
|
||
deb_names = ['TopsInference', 'torch', 'diffusers', 'transformers', 'torch-gcu', 'Pillow', 'opencv-python'] | ||
|
||
for deb in deb_names: | ||
cmd = f"python3 -m pip list | grep '{deb}' | awk " + "'{print $2}' | head -n 1" | ||
result_str = run_command(cmd) | ||
python_info[deb] = result_str | ||
|
||
|
||
return python_info | ||
|
||
|
||
def get_os_info(): | ||
os_info = {} | ||
|
||
cmds = ['lsb_release -sd', 'uname -r'] | ||
keys = ['os_distrb_name', 'os_kernel_version'] | ||
|
||
for cmd, key in zip(cmds, keys): | ||
result_str = run_command(cmd) | ||
os_info[key] = result_str | ||
|
||
return os_info | ||
|
||
|
||
def get_cpu_info(): | ||
cpu_info = {} | ||
|
||
cmds = [ | ||
"uname -m", | ||
'cat /proc/cpuinfo | grep "model name" | uniq | awk \'{str=""; for (i=4; i<=NF; i++) str=str $i " "; print str}\'', | ||
"cat /proc/cpuinfo | grep \"vendor_id\" | uniq | awk '{print $3}'", | ||
] | ||
|
||
keys = ['cpu_arch', 'cpu_model_name', 'cpu_vendor'] | ||
|
||
for key, cmd in zip(keys, cmds): | ||
result_str = run_command(cmd) | ||
cpu_info[key] = result_str | ||
|
||
return cpu_info | ||
|
||
|
||
def get_disk_info(): | ||
disk_info = {} | ||
cmds = [ | ||
"lsblk -o NAME,MODEL", | ||
] | ||
|
||
keys = ['disk_model'] | ||
|
||
for key, cmd in zip(keys, cmds): | ||
result_str = run_command(cmd) | ||
disk_info[key] = result_str | ||
|
||
return disk_info | ||
|
||
|
||
def get_host_info(): | ||
host_info = {} | ||
cmds = [ | ||
"hostname", | ||
"hostname -I | awk '{print $1}'" | ||
] | ||
|
||
keys = ['host_name', 'host_ip'] | ||
|
||
for key, cmd in zip(keys, cmds): | ||
result_str = run_command(cmd) | ||
host_info[key] = result_str | ||
|
||
return host_info | ||
|
||
|
||
def get_meta_info(device_id=None): | ||
meta_info = {} | ||
meta_info['device_info'] = get_device_info(device_id) | ||
meta_info['deb_info'] = get_deb_pkg_info() | ||
meta_info['python_info'] = get_python_pkg_info() | ||
meta_info['os_info'] = get_os_info() | ||
meta_info['cpu_info'] = get_cpu_info() | ||
meta_info['disk_info'] = get_disk_info() | ||
meta_info['host_info'] = get_host_info() | ||
|
||
return meta_info | ||
|
||
|
||
if __name__ == '__main__': | ||
import json | ||
|
||
meta_info = get_meta_info(device_id=0) | ||
meta = json.dumps(meta_info, indent=4) | ||
print(meta) |
Oops, something went wrong.