Merge pull request #8 from hhhhhharry/main

add inference models (open sora, sdxl, chatglm)
EnflameTechnology · Sep 26, 2024 · 0b58dd8 · 0b58dd8
2 parents 895cdb2 + 285ea1d
commit 0b58dd8
Show file tree

Hide file tree

Showing 83 changed files with 15,565 additions and 0 deletions.
diff --git a/inference_models/pytorch/aigc/text_generation/chatglm3-6b/README.md b/inference_models/pytorch/aigc/text_generation/chatglm3-6b/README.md
@@ -0,0 +1,181 @@
+## README ChatGLM3-6B
+
+# 目录
+
+<!-- TOC -->
+
+- [目录](#目录)
+  - [ChatGLM3-6B介绍](#chatglm3-6b介绍)
+  - [模型文件](#模型文件)
+  - [环境要求](#环境要求)
+  - [模型验证运行示例](#模型验证运行示例)
+    - [批量离线推理](#批量离线推理)
+    - [性能测试](#性能测试)
+  - [模型验证结果示例](#模型验证结果示例)
+    - [批量离线推理结果示例](#批量离线推理结果示例)
+    - [性能测试结果示例](#性能测试结果示例)
+
+<!-- /TOC -->
+
+## ChatGLM3-6B介绍
+
+ChatGLM3-6B是一个文生文开源双语大模型，由智谱AI和清华大学KEG实验室合力研发，于2023年10月27日首次推出。ChatGLM3-6B基于transformer架构，采用了更加多样的双语数据集进行训练，并使用了全新的Prompt格式，在语义、数学、推理、代码、知识等各方面的表现均有所提升。该模型可支持的最大上下文长度为8K，vLLM已支持该模型的推理。本文档介绍在Enflame GCU上基于vLLM进行ChatGLM3-6B的推理及性能评估过程。
+
+[论文：Zhipu AI, Tsinghua University. ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools.](https://arxiv.org/pdf/2406.12793)
+
+## 模型文件
+
+模型权重可以通过以下链接下载：
+- 使用HuggingFace下载：
+    - [ChatGLM3-6B](https://huggingface.co/THUDM/chatglm3-6b/tree/main)
+    - 分支：main
+    - commit id：91a0561
+- 或者使用ModelScope下载：
+    - [ChatGLM3-6B](https://modelscope.cn/models/zhipuai/chatglm3-6b/files)
+    - 分支：master
+    - commit id：36fd140f
+
+下载后将所有文件放入`chatglm3-6b`文件夹
+
+## 环境要求
+
+软硬件需求
+- OS：ubuntu 20.04
+- Python：3.8 - 3.10
+- 加速卡：燧原S60
+
+推理框架安装
+- ChatGLM-6B基于`vLLM`推理框架进行评估测试
+- 以下步骤基于拟使用的 `Python3`版本, 请先安装对应`Python3`版本的所需依赖，需要在**docker**内安装：
+- 安装`vLLM`之前，请完成`TopsRider`软件栈安装，安装过程请参考《TopsRider软件栈安装手册》;
+- 首先通过如下命令检查`vllm`及相关依赖是否已经安装：
+    ```shell
+    python3 -m pip list | grep vllm
+    python3 -m pip list | grep xformers
+    python3 -m pip list | grep tops-extension
+    ```
+- 如果已经正常安装，可以显示如下内容：
+    ```
+    vllm                              <version>+gcu
+    xformers                          <version>
+    tops-extension                    <version>
+    ```
+- 如果未安装，可以通过`TopsRider`完成`vllm`安装：
+    ```shell
+    ./Topsrider_xxx.run -y -C vllm
+    ```
+
+依赖安装
+
+- 进入 chatglm3-6b 根目录，执行：
+
+  ```bash
+  pip3 install -r requirements.txt
+  ```
+
+## 模型验证运行示例
+
+您可以使用如下命令运行vllm_utils模块中的相关脚本，对模型进行测试
+
+### 批量离线推理
+
+示例运行命令
+
+```bash
+python3 -m vllm_utils.benchmark_test \
+ --model=[path_of_chatglm] \
+ --tensor-parallel-size=1 \
+ --device gcu \
+ --demo=te \
+ --dtype=float16 \
+ --output-len=256
+```
+
+参数说明
+
+```text
+使用vllm_utils.benchmark_test进行批量离线推理使用的主要参数如下：
+
+--model：模型加载路径。
+--tensor-parallel-size：推理时使用的gpu数量。
+--device：进行推理的设备类型。可选值为"gcu"，"cpu"或"cuda"。
+--demo：推理使用的示例输入，以下是可选值及其代表的示例输入类型：
+    "te": "text-english"
+    "tc": "text-chinese"
+    "ch": "chat"
+    "chc": "character-chat"
+    "cc": "code-completion"
+    "ci": "code-infilling"
+    "cin": "code-instruction"
+--dtype：模型权重以及激活层的数据类型
+--output-len：每个prompt得到的输出长度
+```
+
+### 性能测试
+
+示例运行命令
+
+```bash
+python3 -m vllm_utils.benchmark_test --perf \
+ --model=[path_of_chatglm] \
+ --tensor-parallel-size=1 \
+ --device gcu \
+ --max-model-len=8192 \
+ --tokenizer=[path_of_chatglm] \
+ --input-len=128 \
+ --output-len=128 \
+ --num-prompts=64 \
+ --block-size=64 \
+ --dtype=float16
+```
+
+参数说明
+
+```text
+使用vllm_utils.benchmark_test进行性能测试使用的主要参数如下：
+
+--perf：使用性能评估模式。
+--model：模型加载路径。
+--tensor-parallel-size：推理时使用的gpu数量。
+--device：进行推理的设备类型。可选值为"gcu"，"cpu"或"cuda"。
+--max-model-len：模型最大的文本序列长度（输入与输出均计入）
+--tokenizer：模型tokenizer加载路径，一般与模型路径一致
+--input-len：输入的prompt长度
+--output-len：每个prompt得到的输出长度
+--num-prompts：输入的prompt数量
+--block-size：Paged Attention的block大小
+--dtype：模型权重以及激活层的数据类型
+```
+注：
+- ChatGLM3-6B支持的`max-model-len`为最大8192
+- `input-len`、`output-len`和`num-prompts`可按需调整
+- 配置`output-len`为1时,输出内容中的`latency`即为time_to_first_token_latency
+
+## 模型验证结果示例
+
+### 批量离线推理结果示例
+
+以--output-len=32为例：
+```text
+Prompt: 'Hello, my name is', Generated text: ' [Name], and I am a [Job Title] at [Company Name]. I am excited to be here today to share with you some insights on ['
+Prompt: 'The president of the United States is', Generated text: ' elected by the people of the United States, but the president is not chosen by popular vote. Instead, the president is chosen through a process called the Electoral'
+Prompt: 'The capital of France is', Generated text: ' Paris. \nThe capital of France is Paris.\n\nParis is the capital of France.\n\nParis is a city in France.'
+Prompt: 'The future of AI is', Generated text: ' not just about technology, but also about the ethical and social implications of its use.\nAs AI becomes more advanced, it will become increasingly important to consider the'
+```
+
+### 性能测试结果示例
+
+推理结束后会输出性能指标，自动存储在当前目录中生成的.csv文件内
+
+```text
+***Perf Info***
+{
+    "latency_num_prompts": "xxx ms",
+    "latency_per_token": "xxx ms",
+    "request_per_second": "xxx requests/s",
+    "token_per_second": "xxx tokens/s",
+    "prefill_latency_per_token": "xxx ms",
+    "decode_latency_per_token": "xxx ms",
+    "decode_throughput": "xxx tokens/s"
+}
+```
diff --git a/inference_models/pytorch/aigc/text_generation/chatglm3-6b/requirements.txt b/inference_models/pytorch/aigc/text_generation/chatglm3-6b/requirements.txt
@@ -0,0 +1 @@
+protobuf==3.20.1
diff --git a/inference_models/pytorch/aigc/text_to_image/common/__init__.py b/inference_models/pytorch/aigc/text_to_image/common/__init__.py
diff --git a/inference_models/pytorch/aigc/text_to_image/common/get_meta_info.py b/inference_models/pytorch/aigc/text_to_image/common/get_meta_info.py
@@ -0,0 +1,142 @@
+import subprocess
+
+
+def run_command(cmd):
+    run = subprocess.run(cmd, shell=True, stdout=subprocess.PIPE)
+    result = run.stdout
+    result_str = result.decode().strip('\n').strip()
+
+    return result_str
+
+
+def get_device_info(device_id):
+    sys_info = {}
+    fields = ['Dev Name', 'Ver', 'GCU CLK', 'Mem CLK']
+
+    for field in fields:
+        col = 3 if field == 'Ver' else 4
+
+        cmd = f"efsmi --q -i 0 | grep '{field}' | awk " + "'{print $" + str(col) + "}'"
+        result_str = run_command(cmd)
+
+        if 'CLK' in field:
+            result_str += 'MHz'
+
+        if field == 'Ver':
+            field == 'KMD Ver'
+
+        sys_info[field] = result_str
+
+    return sys_info
+
+
+def get_deb_pkg_info():
+    deb_info = {}
+    deb_names = ['topsruntime', 'tops-sdk', 'tops-inference', 'topsaten']
+
+    for deb in deb_names:
+        cmd = f"dpkg -l | grep '{deb}' | awk " + "'{print $3}'"
+        result_str = run_command(cmd)
+        deb_info[deb] = result_str
+
+    return deb_info
+
+
+def get_python_pkg_info():
+    python_info = {}
+
+    result_str = run_command('python3 -V')
+    python_info['python'] = result_str
+
+    deb_names = ['TopsInference', 'torch', 'diffusers', 'transformers', 'torch-gcu', 'Pillow', 'opencv-python']
+
+    for deb in deb_names:
+        cmd = f"python3 -m pip list | grep '{deb}' | awk " + "'{print $2}' | head -n 1"
+        result_str = run_command(cmd)
+        python_info[deb] = result_str
+
+
+    return python_info
+
+
+def get_os_info():
+    os_info = {}
+
+    cmds = ['lsb_release -sd', 'uname -r']
+    keys = ['os_distrb_name', 'os_kernel_version']
+
+    for cmd, key in zip(cmds, keys):
+        result_str = run_command(cmd)
+        os_info[key] = result_str
+
+    return os_info
+
+
+def get_cpu_info():
+    cpu_info = {}
+
+    cmds = [
+        "uname -m",
+        'cat /proc/cpuinfo | grep "model name" | uniq | awk \'{str=""; for (i=4; i<=NF; i++) str=str $i " "; print str}\'',
+        "cat /proc/cpuinfo | grep \"vendor_id\" | uniq | awk '{print $3}'",
+    ]
+
+    keys = ['cpu_arch', 'cpu_model_name', 'cpu_vendor']
+
+    for key, cmd in zip(keys, cmds):
+        result_str = run_command(cmd)
+        cpu_info[key] = result_str
+
+    return cpu_info
+
+
+def get_disk_info():
+    disk_info = {}
+    cmds = [
+        "lsblk -o NAME,MODEL",
+    ]
+
+    keys = ['disk_model']
+
+    for key, cmd in zip(keys, cmds):
+        result_str = run_command(cmd)
+        disk_info[key] = result_str
+
+    return disk_info
+
+
+def get_host_info():
+    host_info = {}
+    cmds = [
+        "hostname",
+        "hostname -I | awk '{print $1}'"
+    ]
+
+    keys = ['host_name', 'host_ip']
+
+    for key, cmd in zip(keys, cmds):
+        result_str = run_command(cmd)
+        host_info[key] = result_str
+
+    return host_info
+
+
+def get_meta_info(device_id=None):
+    meta_info = {}
+    meta_info['device_info'] = get_device_info(device_id)
+    meta_info['deb_info'] = get_deb_pkg_info()
+    meta_info['python_info'] = get_python_pkg_info()
+    meta_info['os_info'] = get_os_info()
+    meta_info['cpu_info'] = get_cpu_info()
+    meta_info['disk_info'] = get_disk_info()
+    meta_info['host_info'] = get_host_info()
+
+    return meta_info
+
+
+if __name__ == '__main__':
+    import json
+
+    meta_info = get_meta_info(device_id=0)
+    meta = json.dumps(meta_info, indent=4)
+    print(meta)