Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory and time requirements for Mistral-7B #68

Closed
NamburiSrinath opened this issue Sep 12, 2024 · 4 comments
Closed

Memory and time requirements for Mistral-7B #68

NamburiSrinath opened this issue Sep 12, 2024 · 4 comments

Comments

@NamburiSrinath
Copy link

NamburiSrinath commented Sep 12, 2024

Hi,

I am trying to prune Mistral 7B (https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) and while I was able to successfully run the commands for magnitude pruning, I was facing issues with SparseGPT and Wanda.

  • SparseGPT: Took more than an hour and threw a CUDA OOM error (I'm working on a g5.24x which is 4 x 24GB), so I believe that should be definitely enough
  • Wanda: The code is running for ~2hrs! And failed with CUDA OOM

Commands used:
python main.py --model 'mistralai/Mistral-7B-Instruct-v0.2' --prune_method sparsegpt --sparsity_ratio 0.1 --sparsity_type unstructured --save out/mistral_7b/unstructured/sparsegpt/0.1/ --save_model out/mistral_7b/unstructured/sparsegpt/0.1/

python main.py --model 'mistralai/Mistral-7B-Instruct-v0.2' --prune_method wanda --sparsity_ratio 0.1 --sparsity_type unstructured --save out/mistral_7b/unstructured/wanda/0.1/ --save_model out/mistral_7b/unstructured/wanda/0.1/

Any help here would be greatly appreciated :), tagging authors - @liuzhuang13 , @Eric-mingjie and @eltociear

@NamburiSrinath
Copy link
Author

NamburiSrinath commented Sep 12, 2024

Update --- The error comes from initializing torch.zeros(), below is the tracestack.

Traceback (most recent call last):                                                                                                                                                        
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 113, in <module>                                                                                                                 
    main()                                                                                                                                                                                
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 73, in main                                                                                                                      
    prune_sparsegpt(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)                                                                                                     
  File "/home/ubuntu/anaconda3/envs/compress_align/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context                                                 
    return func(*args, **kwargs)                                                                                                                                                          
  File "/home/ubuntu/Compress_Align/wanda/lib/prune.py", line 230, in prune_sparsegpt                                                                                                     
    inps = torch.zeros(                                                                                                                                                                   
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 1 has a total capacity of 21.99 GiB of which 16.77 GiB is free. Including non-PyTorch memory, this process ha
s 5.21 GiB memory in use. Of the allocated memory 4.88 GiB is allocated by PyTorch, and 89.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try 
setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Upon debugging further, here are the values for print(args.nsamples, model.seqlen, model.config.hidden_size)

Mistral-7B:  128, 32768, 4096
Llama-2-7B: 128, 4096, 4096

So basically the sequence length of Mistral is very large which doesn't allow for creation of tensor.

Are there any suggestions to overcome this error?

P.S: I think this issue is similar to #51 i.e support to Mistral models

@nanxue2023
Copy link

Update --- The error comes from initializing torch.zeros(), below is the tracestack.

Traceback (most recent call last):                                                                                                                                                        
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 113, in <module>                                                                                                                 
    main()                                                                                                                                                                                
  File "/home/ubuntu/Compress_Align/wanda/main.py", line 73, in main                                                                                                                      
    prune_sparsegpt(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)                                                                                                     
  File "/home/ubuntu/anaconda3/envs/compress_align/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context                                                 
    return func(*args, **kwargs)                                                                                                                                                          
  File "/home/ubuntu/Compress_Align/wanda/lib/prune.py", line 230, in prune_sparsegpt                                                                                                     
    inps = torch.zeros(                                                                                                                                                                   
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB. GPU 1 has a total capacity of 21.99 GiB of which 16.77 GiB is free. Including non-PyTorch memory, this process ha
s 5.21 GiB memory in use. Of the allocated memory 4.88 GiB is allocated by PyTorch, and 89.82 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try 
setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Upon debugging further, here are the values for print(args.nsamples, model.seqlen, model.config.hidden_size)

Mistral-7B:  128, 32768, 4096
Llama-2-7B: 128, 4096, 4096

So basically the sequence length of Mistral is very large which doesn't allow for creation of tensor.

Are there any suggestions to overcome this error?

P.S: I think this issue is similar to #51 i.e support to Mistral models

maybe you can shorten the input sequence length supported by the model, I guess.

@yaolu-zjut
Copy link

Hi, do you solve this problem?

@NamburiSrinath
Copy link
Author

NamburiSrinath commented Jan 15, 2025

I ended up shortening the input sequence length and was able to compress the model.

Closing the issue as that's a workaround. Not sure if it's ideal or not though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants