Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

这些问题不要在Issue提出,在Discussion中提出 #394

Open
zRzRzRzRzRzRzR opened this issue Nov 21, 2023 · 26 comments
Open

这些问题不要在Issue提出,在Discussion中提出 #394

zRzRzRzRzRzRzR opened this issue Nov 21, 2023 · 26 comments
Assignees
Labels
duplicate This issue or pull request already exists enhancement New feature or request help wanted Extra attention is needed question Further information is requested wontfix This will not be worked on

Comments

@zRzRzRzRzRzRzR
Copy link
Member

zRzRzRzRzRzRzR commented Nov 21, 2023

以下问题不要在issue中提出,因为:

  1. 这个问题是模型的上的功能缺失或者bad case
  2. 官方暂时不能解决这个问题

如果你遇到以下问题,请你在Discussion中提出具体的 bad case,这些问题在本版本模型难以解决,更多的bad case将能帮助我们优化出更好的模型。

#393 模型运行的 CUDA error: device-side assert相关问题
#212 工具调用混乱,一些场景被训练为调用工具
#335 多轮对话无法实现正常工具调用
#306 在持续进行对话时候GPU 内存占用飙升
#310 多卡推理不正常,乱码相关问题
#225 中英文混合输出,输出会带英语单词

以下问题,如果非官方代码和官方硬件上的报错,请也在Discussion中提出
#251 Mac环境下的配置和环境准备
#253 微调的各种问题

上述两大类问题请不要 在 Issue提出,否则可能会不被回复或者 直接关闭。
感谢理解

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR added the wontfix This will not be worked on label Nov 21, 2023
@zRzRzRzRzRzRzR zRzRzRzRzRzRzR pinned this issue Nov 21, 2023
@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Nov 22, 2023
@zRzRzRzRzRzRzR zRzRzRzRzRzRzR added bug Something isn't working duplicate This issue or pull request already exists enhancement New feature or request help wanted Extra attention is needed question Further information is requested and removed bug Something isn't working labels Nov 22, 2023
@jiawei243 jiawei243 unpinned this issue Nov 24, 2023
@zRzRzRzRzRzRzR zRzRzRzRzRzRzR pinned this issue Nov 24, 2023
@zRzRzRzRzRzRzR zRzRzRzRzRzRzR changed the title 各种Bad Case请检查 Discussion中的内容 这些问题不要在Issue提出,在Discussion中提出 Nov 29, 2023
@youyouge
Copy link

几个问题想请教一下,
第一:为什么训练出来的模型是问答式的,是因为数据量太少吗,格式和官方给的测试数据集格式一样都是类型#描述类型#描述,训练使用的是./scripts/finetune_pt.sh 参数只改了sourse和target还有step三个数值。
第二:为什么训练完的模型,使用“cd ../composite_demo
MODEL_PATH="THUDM/chatglm3-6b" PT_PATH="path to p-tuning checkpoint" streamlit run main.py”启动UI界面,使用API启动,最终运行的还是原来的chatglm3-6b,不是微调后的模型
第三:数据集的格式“类型#描述
类型#描述”,后面只描述一个或多个描述词或者类型词,可以得到回复吗,是经过ChatGLM3-6B思考后的回复还是数据集的内容完全一模一样的回复还是会参考数据集答案的格式来回复。。。。

@kokomidaisiki
Copy link

关于懒人包的使用,b站视频来的,抱歉因为技术不行不知道哪里出了问题
mmexport1703184293371

@whisky-12
Copy link

请问openai_api.py是否会更细关于处理向量的 v1/embedding的接口?

@lostmaniac
Copy link

请问openai_api.py是否会更细关于处理向量的 v1/embedding的接口?

直接合并一下别的项目的代码就行了.

@zRzRzRzRzRzRzR
Copy link
Member Author

openai demo 已经更新 支持了embedded

@CNCSMonster
Copy link

Discussion是说在这里的comment吗? <---像这样?

@zRzRzRzRzRzRzR
Copy link
Member Author

是在 github disscussion对应的标题讨论

@sunheyang1
Copy link

关于懒人包的使用,b站视频来的,抱歉因为技术不行不知道哪里出了问题 mmexport1703184293371

可能是权限问题,在C盘下找到Users找到(你的用户名),右键,选中属性,找到安全选项卡,选择自己的用户,就像这样(看下面)
屏幕截图 2024-01-15 135449
找到编辑,吧完全控制的勾搭上
重新执行

@langshuocheng
Copy link

ASK:

  • ChatGLM3是使用BBPE实现分词么?

@itlittlekou
Copy link

你好,我在用lora微调过程中,一直卡在
Total optimization steps = 3,000
Number of trainable parameters = 1,949,696
0% 0/3000 [00:00<?, ?it/s]
这个地方不动了是什么原因呀,请教各位大佬指教

@itlittlekou
Copy link

你好,在lora微调过程中出现了RuntimeError: "addmm_impl_cpu_" not implemented for 'Half',这个问题,我提升了pytorch的版本之后,就一直卡在
otal optimization steps = 3,000
Number of trainable parameters = 1,949,696
0% 0/3000 [00:00<?, ?it/s]
这个地方不动了是什么原因呀,请教各位大佬指教

@LiangYong1216
Copy link

这个错误通常发生在使用PyTorch框架进行深度学习模型训练时,特别是在尝试使用半精度浮点数(‘Half’,即16位浮点数)进行矩阵乘加操作(addmm)时。半精度浮点数是一种用于加速计算和减少内存占用的技术,但它并不支持所有的操作。
为了解决这个问题,你可以考虑以下几种方法:
使用全精度浮点数(‘Float’): 将模型和数据类型转换为全精度浮点数,通常是32位浮点数(torch.float32 或 torch.FloatTensor)。这样可以确保所有的操作都是支持的,但是会增加内存使用和计算时间。

例如,将一个张量从半精度转换为全精度

tensor = tensor.to(dtype=torch.float32)

@RexxyWong
Copy link

如果出現以下問題要如何解決,我是使用官方我代碼和DATASET試行

RuntimeError: element 0 of tensors does not require grad and does not have a
grad_fn
0%| | 0/3000 [00:01<?, ?it/s]

@lei124215
Copy link

你好,在lora微调过程中出现了RuntimeError: "addmm_impl_cpu_" not implemented for 'Half',这个问题,我提升了pytorch的版本之后,就一直卡在 otal optimization steps = 3,000 Number of trainable parameters = 1,949,696 0% 0/3000 [00:00<?, ?it/s] 这个地方不动了是什么原因呀,请教各位大佬指教

您好,请问问题解决了吗,我也遇到了相同的问题

@NENCAO
Copy link

NENCAO commented Mar 30, 2024

我在使用glm3的时候 embeddings接口一直报如图的错误是为什么呢
QQ截图20240330160435

@markoov
Copy link

markoov commented Apr 1, 2024

请问各位大佬,uvicorn实现api_server时,为什么workers大于1的时候会报错:模型未定义。难道不能实现多进程吗,该怎么解决这个问题?

@FanZhang91
Copy link

用main分支代码做finetune, 用inference_hf代码做预测在解析responce中的output时会报错?而且finetune保存的模型,无法直接用composite_demo的形式直接调用?请问如何解决这两个问题?

@zainCSU
Copy link

zainCSU commented Apr 17, 2024

d119bab636a2dcc61284d5662cebfac
请问使用双卡报错这个有什么办法解决吗

@Mouasea
Copy link

Mouasea commented Apr 18, 2024

请问下chat-glm3-6B的模型参数:Hidden Size、Num Layers、Num Attention Heads、Vocab Size是多少呀?没有在社区中看到有公布这个信息

@jwc19890114
Copy link

在本地安装之后,使用streamlit和graido都可以进入界面,但是提问没有反应,这是什么情况啊

@Bule-dog
Copy link

QQ截图20240424161143
为什么从保存点进行微调会报错?

@michaelwind1315
Copy link

启动composite_demo之后提问响应极慢,看起来没有启用GPU加速,需要怎么修改启动GPU加速呢?

@thomasyyang
Copy link

thomasyyang commented Jun 21, 2024

第一次尝试 github 中微调示例,未执行成功,报以下错误,请问如何解决?:

「执行命令」:
CUDA_VISIBLE_DEVICES=0 NCCL_P2P_DISABLE="1" NCCL_IB_DISABLE="1" python finetune_hf.py data/AdvertiseGen_fix /home/notebook/toG_RMM/MHRED/chatglm3-6b configs/lora.yaml

「输出信息」:
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last): File "/opt/notebook/toG_RMM/MHRED/ChatGLM3-main/finetune_demo/finetune_hf.py", line 11, in
import torch
File "/home/conda/envs/python3.10.6/lib/python3.10/site-packages/torch/init.py", line 1382, in
from .functional import * # noqa: F403
File "/home/conda/envs/python3.10.6/lib/python3.10/site-packages/torch/functional.py", line 7, in
import torch.nn.functional as F
File "/home/conda/envs/python3.10.6/lib/python3.10/site-packages/torch/nn/init.py", line 1, in
from .modules import * # noqa: F403
File "/home/conda/envs/python3.10.6/lib/python3.10/site-packages/torch/nn/modules/init.py", line 35, in
from .transformer import TransformerEncoder, TransformerDecoder,
File "/home/conda/envs/python3.10.6/lib/python3.10/site-packages/torch/nn/modules/transformer.py", line 20, in
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
/home/conda/envs/python3.10.6/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: _ARRAY_API not found (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.)
device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/notebook/toG_RMM/MHRED/ChatGLM3-main/finetune_demo/ │
│ finetune_hf.py:458 in main │
│ │
│ 455 │ │ ), │
│ 456 │
│ 457 ): │
│ ❱ 458 │ ft_config = FinetuningConfig.from_file(config_file) │
│ 459 │ tokenizer, model = load_tokenizer_and_model(model_dir, peft_config=ft_config.peft_co │
│ 460 │ data_manager = DataManager(data_dir, ft_config.data_config) │
│ 461 │
│ │
│ /opt/notebook/toG_RMM/MHRED/ChatGLM3-main/finetune_demo/ │
│ finetune_hf.py:209 in from_file │
│ │
│ 206 │ def from_file(cls, path: Union[str, Path]) -> 'FinetuningConfig': │
│ 207 │ │ path = _resolve_path(path) │
│ 208 │ │ kwargs = _get_yaml_parser().load(path) │
│ ❱ 209 │ │ return cls.from_dict(**kwargs) │
│ 210 │
│ 211 │
│ 212 def _load_datasets( │
│ │
│ /opt/notebook/toG_RMM/MHRED/ChatGLM3-main/finetune_demo/ │
│ finetune_hf.py:194 in from_dict │
│ │
│ 191 │ │ │ │ training_args['generation_config'] = GenerationConfig( │
│ 192 │ │ │ │ │ **gen_config │
│ 193 │ │ │ │ ) │
│ ❱ 194 │ │ │ kwargs['training_args'] = Seq2SeqTrainingArguments(**training_args) │
│ 195 │ │ │
│ 196 │ │ data_config = kwargs.get('data_config') │
│ 197 │ │ if not isinstance(data_config, DataConfig): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: Seq2SeqTrainingArguments.init() got an unexpected keyword argument 'use_cpu'

@xiherespect
Copy link

交互就会报错 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

root@dsw-430842-7cc9db4b4d-gl5v6:/mnt/workspace/webcodes/ChatGLM3/basic_demo# python cli_demo.py
Setting eos_token is not supported, use the default one.
Setting pad_token is not supported, use the default one.
Setting unk_token is not supported, use the default one.
Loading checkpoint shards: 100%|█████████████████████████████████████████████| 7/7 [00:03<00:00, 1.89it/s]
欢迎使用 ChatGLM3-6B 模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序

用户: 你好

ChatGLM:2024-08-21 02:36:40.210853: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-21 02:36:41.289374: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/mnt/workspace/webcodes/ChatGLM3/basic_demo/cli_demo.py", line 57, in
main()
File "/mnt/workspace/webcodes/ChatGLM3/basic_demo/cli_demo.py", line 43, in main
for response, history, past_key_values in model.stream_chat(tokenizer, query, history=history, top_p=1,
File "/usr/local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm3-6b/modeling_chatglm.py", line 1082, in stream_chat
response = tokenizer.decode(outputs)
File "/usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3809, in decode
token_ids = to_py_obj(token_ids)
File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 272, in to_py_obj
return [to_py_obj(o) for o in obj]
File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 272, in
return [to_py_obj(o) for o in obj]
File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 277, in to_py_obj
if test_func(obj):
File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 211, in is_tf_tensor
return False if not is_tf_available() else _is_tensorflow(x)
File "/usr/local/lib/python3.10/site-packages/transformers/utils/generic.py", line 202, in _is_tensorflow
import tensorflow as tf
File "/usr/local/lib/python3.10/site-packages/tensorflow/init.py", line 45, in
from tensorflow._api.v2 import internal
File "/usr/local/lib/python3.10/site-packages/tensorflow/_api/v2/internal/init.py", line 13, in
from tensorflow._api.v2.internal import feature_column
File "/usr/local/lib/python3.10/site-packages/tensorflow/_api/v2/internal/feature_column/init.py", line 8, in
from tensorflow.python.feature_column.feature_column_v2 import DenseColumn # line: 1777
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column_v2.py", line 38, in
from tensorflow.python.feature_column import feature_column as fc_old
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column.py", line 41, in
from tensorflow.python.layers import base
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/layers/base.py", line 16, in
from tensorflow.python.keras.legacy_tf_layers import base
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/init.py", line 25, in
from tensorflow.python.keras import models
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/models.py", line 25, in
from tensorflow.python.keras.engine import training_v1
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_v1.py", line 46, in
from tensorflow.python.keras.engine import training_arrays_v1
File "/usr/local/lib/python3.10/site-packages/tensorflow/python/keras/engine/training_arrays_v1.py", line 37, in
from scipy.sparse import issparse # pylint: disable=g-import-not-at-top
File "/usr/local/lib/python3.10/site-packages/scipy/sparse/init.py", line 294, in
from ._base import *
File "/usr/local/lib/python3.10/site-packages/scipy/sparse/_base.py", line 5, in
from scipy._lib._util import VisibleDeprecationWarning
File "/usr/local/lib/python3.10/site-packages/scipy/_lib/_util.py", line 18, in
from scipy._lib._array_api import array_namespace
File "/usr/local/lib/python3.10/site-packages/scipy/_lib/array_api.py", line 15, in
from numpy.testing import assert

File "/usr/local/lib/python3.10/site-packages/numpy/testing/init.py", line 11, in
from ._private.utils import *
File "/usr/local/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1253, in
_SUPPORTS_SVE = check_support_sve()
File "/usr/local/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1247, in check_support_sve
output = subprocess.run(cmd, capture_output=True, text=True)
File "/usr/local/lib/python3.10/subprocess.py", line 505, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/local/lib/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/local/lib/python3.10/subprocess.py", line 2059, in _communicate
stdout = self._translate_newlines(stdout,
File "/usr/local/lib/python3.10/subprocess.py", line 1031, in _translate_newlines
data = data.decode(encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 0: ordinal not in range(128)

@GH-cz0827
Copy link

完全不能启动,操作系统是基于Mobaxterm的Linux远程主机

模型是从modelscope克隆的,但删除了名称含safetensors的所有文件;检查了cli_demo.py中MODEL_PATH路径无误,直接运行python3 cli_demo.py会报错如下:
(glm) s_xyw@ychx:~/TanHC/GLMtest/ChatGLM3/basic_demo$ python3 cli_demo.py
Traceback (most recent call last):
File "cli_demo.py", line 8, in
tokenizer = AutoTokenizer.from_pretrained(TOKENIZER_PATH, trust_remote_code=True)
File "/home/s_xyw/.local/lib/python3.8/site-packages/transformers/models/auto/tokenization_auto.py", line 847, in from_pretrained
return tokenizer_class.from_pretrained(
File "/home/s_xyw/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2089, in from_pretrained
return cls._from_pretrained(
File "/home/s_xyw/.local/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2311, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/s_xyw/.cache/huggingface/modules/transformers_modules/chatglm3-6b/tokenization_chatglm.py", line 109, in init
self.tokenizer = SPTokenizer(vocab_file)
File "/home/s_xyw/.cache/huggingface/modules/transformers_modules/chatglm3-6b/tokenization_chatglm.py", line 18, in init
self.sp_model = SentencePieceProcessor(model_file=model_path)
File "/home/s_xyw/.local/lib/python3.8/site-packages/sentencepiece/init.py", line 468, in Init
self.Load(model_file=model_file, model_proto=model_proto)
File "/home/s_xyw/.local/lib/python3.8/site-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/home/s_xyw/.local/lib/python3.8/site-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from /home/s_xyw/TanHC/GLMtest/chatglm3-6b/tokenizer.model

是modelscope的tokenizer.model文件有问题吗?另外这个远程主机因为网络问题没法连接到huggingface,不能从hf克隆模型

@Hoyxxx
Copy link

Hoyxxx commented Sep 29, 2024

python inference_hf.py ./output/checkpoint-3000 --prompt "listen to westbam alumb allergic on google music"
为什么调用训练后的模型没有效果,感觉还是原模型的回复

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists enhancement New feature or request help wanted Extra attention is needed question Further information is requested wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests