role play 下 VRAM使用不斷的增加 #9

allencyhsu · 2024-06-16T15:26:24Z

模型加載大概占用5G，來回的對話幾次後，就跳到6G，增加一次對話大概增加300MB記憶體，請問有辦法克服這個問題嗎?

==============================
python realtime_chat.py --role_name 三三
-----PERFORM NORM HEAD
user:你好
/home/allen/miniconda3/envs/index/lib/python3.10/site-packages/transformers/generation/utils.py:1417: UserWarning: You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
warnings.warn(
三三:你好，我是三三，请问有什么我可以帮助您的吗？
user:介紹一下B站
三三:B站是中国最大的在线视频平台之一，提供丰富的动画、游戏、音乐、舞蹈等视频内容，以及直播、互动社区等功能。同时，B站也是一个多元化的社区，吸引了大量的年轻用户。

BitVoyage · 2024-06-17T07:04:06Z

可以使用我们README中提供的量化脚本来量化模型，可以大幅降低内存占用

lingyun-gao · 2024-06-17T14:21:33Z

多轮对话会涨显存，可以用量化脚本，经测试显存减半

修改realtime_chat.py

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
)

model = AutoModelForCausalLM.from_pretrained(
    self.huggingface_local_path,
    trust_remote_code=True,
    config=config,
    quantization_config=quantization_config,
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

role play 下 VRAM使用不斷的增加 #9

role play 下 VRAM使用不斷的增加 #9

allencyhsu commented Jun 16, 2024

BitVoyage commented Jun 17, 2024

lingyun-gao commented Jun 17, 2024

role play 下 VRAM使用不斷的增加 #9

role play 下 VRAM使用不斷的增加 #9

Comments

allencyhsu commented Jun 16, 2024

BitVoyage commented Jun 17, 2024

lingyun-gao commented Jun 17, 2024