Memory and time requirements for Mistral-7B #68

NamburiSrinath · 2024-09-12T05:51:51Z

Hi,

I am trying to prune Mistral 7B (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) and while I was able to successfully run the commands for magnitude pruning, I was facing issues with SparseGPT and Wanda.

SparseGPT: Took more than an hour and threw a CUDA OOM error (I'm working on a g5.24x which is 4 x 24GB), so I believe that should be definitely enough
Wanda: The code is running for ~2hrs! And failed with CUDA OOM

Commands used:
python main.py --model 'mistralai/Mistral-7B-Instruct-v0.2' --prune_method sparsegpt --sparsity_ratio 0.1 --sparsity_type unstructured --save out/mistral_7b/unstructured/sparsegpt/0.1/ --save_model out/mistral_7b/unstructured/sparsegpt/0.1/

python main.py --model 'mistralai/Mistral-7B-Instruct-v0.2' --prune_method wanda --sparsity_ratio 0.1 --sparsity_type unstructured --save out/mistral_7b/unstructured/wanda/0.1/ --save_model out/mistral_7b/unstructured/wanda/0.1/

Any help here would be greatly appreciated :), tagging authors - @liuzhuang13 , @Eric-mingjie and @eltociear

The text was updated successfully, but these errors were encountered:

NamburiSrinath · 2024-09-12T20:43:58Z

Update --- The error comes from initializing torch.zeros(), below is the tracestack.

Traceback (most recent call last):                                                                                                                                                        
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 113, in <module>                                                                                                                 
    main()                                                                                                                                                                                
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 73, in main                                                                                                                      
    prune_sparsegpt(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)                                                                                                     
  File "/home/ubuntu/anaconda3/envs/compress_align/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context                                                 
    return func(*args, **kwargs)                                                                                                                                                          
  File "/home/ubuntu/Compress_Align/wanda/lib/prune.py", line 230, in prune_sparsegpt                                                                                                     
    inps = torch.zeros(                                                                                                                                                                   
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 1 has a total capacity of 21.99 GiB of which 16.77 GiB is free. Including non-PyTorch memory, this process ha
s 5.21 GiB memory in use. Of the allocated memory 4.88 GiB is allocated by PyTorch, and 89.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try 
setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Upon debugging further, here are the values for print(args.nsamples, model.seqlen, model.config.hidden_size)

Mistral-7B:  128, 32768, 4096
Llama-2-7B: 128, 4096, 4096

So basically the sequence length of Mistral is very large which doesn't allow for creation of tensor.

Are there any suggestions to overcome this error?

P.S: I think this issue is similar to #51 i.e support to Mistral models

nanxue2023 · 2024-12-31T11:11:49Z

Update --- The error comes from initializing torch.zeros(), below is the tracestack.

Traceback (most recent call last):                                                                                                                                                        
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 113, in <module>                                                                                                                 
    main()                                                                                                                                                                                
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 73, in main                                                                                                                      
    prune_sparsegpt(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)                                                                                                     
  File "/home/ubuntu/anaconda3/envs/compress_align/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context                                                 
    return func(*args, **kwargs)                                                                                                                                                          
  File "/home/ubuntu/Compress_Align/wanda/lib/prune.py", line 230, in prune_sparsegpt                                                                                                     
    inps = torch.zeros(                                                                                                                                                                   
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 1 has a total capacity of 21.99 GiB of which 16.77 GiB is free. Including non-PyTorch memory, this process ha
s 5.21 GiB memory in use. Of the allocated memory 4.88 GiB is allocated by PyTorch, and 89.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try 
setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Upon debugging further, here are the values for print(args.nsamples, model.seqlen, model.config.hidden_size)

Mistral-7B:  128, 32768, 4096
Llama-2-7B: 128, 4096, 4096

So basically the sequence length of Mistral is very large which doesn't allow for creation of tensor.

Are there any suggestions to overcome this error?

P.S: I think this issue is similar to #51 i.e support to Mistral models

maybe you can shorten the input sequence length supported by the model, I guess.

yaolu-zjut · 2025-01-12T12:54:37Z

Hi, do you solve this problem?

NamburiSrinath · 2025-01-15T03:23:12Z

I ended up shortening the input sequence length and was able to compress the model.

Closing the issue as that's a workaround. Not sure if it's ideal or not though.

NamburiSrinath mentioned this issue Sep 12, 2024

AttributeError: 'NoneType' object has no attribute 'to' #51

Open

NamburiSrinath closed this as completed Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory and time requirements for Mistral-7B #68

Memory and time requirements for Mistral-7B #68

NamburiSrinath commented Sep 12, 2024 •

edited

Loading

NamburiSrinath commented Sep 12, 2024 •

edited

Loading

nanxue2023 commented Dec 31, 2024

yaolu-zjut commented Jan 12, 2025

NamburiSrinath commented Jan 15, 2025 •

edited

Loading

Memory and time requirements for Mistral-7B #68

Memory and time requirements for Mistral-7B #68

Comments

NamburiSrinath commented Sep 12, 2024 • edited Loading

NamburiSrinath commented Sep 12, 2024 • edited Loading

nanxue2023 commented Dec 31, 2024

yaolu-zjut commented Jan 12, 2025

NamburiSrinath commented Jan 15, 2025 • edited Loading

NamburiSrinath commented Sep 12, 2024 •

edited

Loading

NamburiSrinath commented Sep 12, 2024 •

edited

Loading

NamburiSrinath commented Jan 15, 2025 •

edited

Loading