如何使用 LMDeploy 把 InternLM-20B-4bit 部署为服务 #451
vansin
started this conversation in
Show and tell
Replies: 4 comments 7 replies
-
通过以下步骤,即可快速把 InternLM-20B-Chat 部署为服务,并在线与模型聊天。 第一步:安装 lmdeploy pip install 'lmdeploy>=0.0.9'` 第二步:下载 InternLM-20B-4bit 模型 git-lfs install
git clone https://huggingface.co/internlm/internlm-chat-20b 第三步:转换模型权重格式 python3 -m lmdeploy.serve.turbomind.deploy internlm-chat \
--model-path ./internlm-chat-20b 第四步:启动 gradio 服务 python3 -m lmdeploy.serve.gradio.app ./workspace --server_name {ip_addr} --server_port {port} |
Beta Was this translation helpful? Give feedback.
0 replies
-
cd internlm-chat-20b |
Beta Was this translation helpful? Give feedback.
0 replies
-
量化部署出现这个错误,大佬们看看是为什么? |
Beta Was this translation helpful? Give feedback.
6 replies
-
24G的显存,好像跑不了4bit的量化模型?还是哪里的参数要设置? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
通过以下步骤,即可快速把 InternLM-20B-4bit 部署为服务,并在线与模型聊天。
第一步:安装 lmdeploy
pip install 'lmdeploy>=0.0.9'
第二步:下载 InternLM-20B-4bit 模型
第三步:转换模型权重格式
第四步:启动 gradio 服务
Beta Was this translation helpful? Give feedback.
All reactions