Make this pip installable #82

winglian · 2023-04-17T01:43:08Z

this is a pretty big refactor to:

allow anyone to use any of the submodules in the repo
I've removed a cyclical dependency
the repo no longer requires cuda/gptq if you prefer to use the triton backend, you can choose or the other (pip install .[cuda] or pip install .[triton]

There is some other cleanup that probably needs to be done, but I figured I should see if you want to go down this path. thanks!

johnsmith0031 · 2023-04-17T02:10:24Z

Thanks for doing this! I think I would also merge the cuda kernel used into this repo so that external dependency on GPTQ fork would be no longer needed. I think it would have better compatibility with main GPTQ.

winglian · 2023-04-17T14:57:45Z

problem is the main gptq doesn't even keep the cuda kernel around anymore, they've hitched their horse to triton.

delete kernel:
qwopqwop200/GPTQ-for-LLaMa@2d3256b
deletequant_cuda.cpp:
qwopqwop200/GPTQ-for-LLaMa@e43c506

…00/GPTQ-for-LLaMa/tree/cuda

winglian · 2023-04-17T16:36:47Z

alright, I've moved the quant_cuda into this repo. because of the way setuptools works, it's nearly impossible to make the cudaextension an extras without it being a separate external package, so it will get installed by default and triton is optional

…/sterlind/GPTQ-for-LLaMa/tree/lora_4bit/src/gptq_llama/quant_cuda

johnsmith0031 · 2023-04-18T04:42:44Z

Thanks you for putting everything together! I made a PR to text-generation-webui, once it is merged I'll merge the PR into main. And I think we should adjust the Dockerfile for pip installable alpaca_lora_4bit as well for compatibility.

winglian · 2023-04-18T12:26:09Z

I took a pass at updating the dockerfile, but I don't have cuda on my local machine so can't validate that it's totally correct, if someone else has a chance to look at the dockerfile and build/run it 🙏

myyk · 2023-04-19T09:56:52Z

Dockerfile

@@ -61,14 +61,14 @@ RUN cd text-generation-webui-tmp && python download-model.py --text-only decapod
 # Get LoRA
 RUN cd text-generation-webui-tmp && python download-model.py samwit/alpaca7b-lora && mv loras/samwit_alpaca7b-lora ../alpaca7b_lora

-COPY *.py .
+COPY src .


I don't think this is quite right. I tried to build the image and run it to test it for you, but the symlinks below were not pointing to anything.

If they were ln -s ../alpaca_lora_4bit/autograd_4bit.py ./autograd_4bit.py (remove 'src/') then they would have linked. So I recommend, either change the copy or change the symlinking.

whoops, COPY src . didn't do what I thought 🤦

Dockerfile updated!

I won't be able to test that for a bit. I broke my machine pretty badly.

johnsmith0031 · 2023-04-20T01:49:21Z

Thanks for everything done here! I think I'll temporarily keep it in winglian-setup_pip branch for those who want to use the pip installable version and the old version as main branch for compatibility with monkeypatch code in webui. May merge them if something changes in the future

myyk · 2023-04-20T08:28:46Z

Still seeing an error when trying to run from Docker. I don't know what's going on enough to fix this, but it's not working by simply running pip install triton. It seems to me like quant_cuda not found. is coming from matmul_utils_4bit.py not finding the quant_cuda folder.

Well anyway, I got my machine back up so that I can help test this.


==========
== CUDA ==
==========

CUDA Version 11.7.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

*************************
** DEPRECATION NOTICE! **
*************************
THIS IMAGE IS DEPRECATED and is scheduled for DELETION.
    https://gitlab.com/nvidia/container-images/cuda/blob/master/doc/support-policy.md

quant_cuda not found. Please run "pip install alpaca_lora_4bit[cuda]".
Triton not found. Please run "pip install triton".
WARNING:root:Neither gptq/cuda or triton backends are available.
Traceback (most recent call last):
  File "/alpaca_lora_4bit/text-generation-webui/server.py", line 1, in <module>
    import custom_monkey_patch # apply monkey patch
  File "/alpaca_lora_4bit/text-generation-webui/custom_monkey_patch.py", line 7, in <module>
    from models import Linear4bitLt
  File "/alpaca_lora_4bit/text-generation-webui/models.py", line 6, in <module>
    from peft.tuners.lora import is_bnb_available, Linear, Linear8bitLt, LoraLayer
ImportError: cannot import name 'Linear8bitLt' from 'peft.tuners.lora' (/root/.local/lib/python3.10/site-packages/peft/tuners/lora.py)

myyk · 2023-04-20T15:09:34Z

I think that last change improved it, but there's still something off. I upgraded CUDA to 11.8 because I don't think 11.7 is working with my driver and it's on it's way out anyway.

==========
== CUDA ==
==========

CUDA Version 11.8.0

Container image Copyright (c) 2016-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

quant_cuda not found. Please run "pip install alpaca_lora_4bit[cuda]".
Triton not found. Please run "pip install triton".
WARNING:root:Neither gptq/cuda or triton backends are available.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /root/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/root/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/root/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/root/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:145: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /root/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so...
Traceback (most recent call last):
  File "/alpaca_lora_4bit/text-generation-webui/server.py", line 1, in <module>
    import custom_monkey_patch # apply monkey patch
  File "/alpaca_lora_4bit/text-generation-webui/custom_monkey_patch.py", line 8, in <module>
    replace_peft_model_with_int4_lora_model()
  File "/alpaca_lora_4bit/text-generation-webui/monkeypatch/peft_tuners_lora_monkey_patch.py", line 4, in replace_peft_model_with_int4_lora_model
    from ..models import GPTQLoraModel
ImportError: attempted relative import beyond top-level package

nealchandra · 2023-04-24T20:45:31Z

I believe this branch is missing this commit 94851ce which at least for me causes a breaking error during build.

I'm curious about the vision for this project, is the intent primarily to support folks who just want an easy way to run text-generation-webui with 4bit quantization? This seems like the case to me (for instance the inference.py code does not actually apply a LoRA, the best example for inference is actually in the webui monkeypatch).

I think it is useful if that is the case, but for me this project would be even more valuable if it moved in the direction of this PR -- e.g. creating a core library which supports running inference and finetunes against multiple model types. This abstraction would then make it easy to plug this into the webui, or an API wrapper, or directly embed in some other python project. It seems hard to accomplish that goal without at least merging this PR back into the trunk.

tensiondriven · 2023-04-25T00:36:24Z

This seems like the case to me

I am using it for a different purpose - To run local training at 4-bit via scripts in an automated and repeatable fashion. It's important to me that I be able to run it separately from text-generation-webui, so I'd hate to lose that functionality.

tensiondriven · 2023-04-25T00:38:46Z

creating a core library which supports running inference and finetunes against multiple model types

I'm sure @johnsmith0031 would know better than me, but I expect that this project's functionality will eventually be exposed in HuggingFace Transformers or other large packages. This project is very cutting-edge, and does things that haven't previously been possible. I like where your intention is, and, I wouldn't want this project to get formalized to the point where it loses the agility needed to support features that are sometimes only a few days old.

urbien · 2023-04-27T03:12:15Z

@johnsmith0031 have you seen LocalAI project?
It creates an OpenAI-compatible server / API wrapper and supports multiple models simultaneously. I want to use my own open source web/mobile app so it fits, but it is designed for CPU-based execution around GGML library, which while being awesome, is too slow and not even possible for 30b models.
So this project, with LoRAs + 4bit + flash-attention optimizations to serve 30b models from 3090-level single GPU would be just heaven! But I had trouble getting it running, let alone start the fine tuning on my own data (I have personal datasets I want to create my own loras on and experiment with multiple different loras on top of the base model). I am a newbie in deep learning, so I might be missing things. In any case, thank you so much for putting this together.

johnsmith0031 · 2023-04-28T15:52:44Z

Thanks! Currently the hosting mode is compatible with text generation webui, which have better inference performance. Feel free to have a try!

winglian added 9 commits April 17, 2023 12:35

make things installable, refactor things

24e7d96

be explicit about the package for install

03d09d7

fix conditional

2305c1f

fix circular import and add monkeypatch submodule in setup

427efec

add missing import

39a6fa1

setup sub modules so imports are easier

fa4c8c1

fix imports on cuda/triton since they aren't in __init__

ceb9f14

fix checks

fe6d135

move the cuda kernel into this repo from https://github.com/qwopqwop2…

d553bf3

…00/GPTQ-for-LLaMa/tree/cuda

winglian force-pushed the setup_pip branch from 0f4758c to d5cf34a Compare April 17, 2023 16:35

fix gptq install

e3369af

winglian force-pushed the setup_pip branch from d5cf34a to e3369af Compare April 17, 2023 16:48

use the correct cuda files from the correct branch https://github.com…

dbdd793

…/sterlind/GPTQ-for-LLaMa/tree/lora_4bit/src/gptq_llama/quant_cuda

johnsmith0031 approved these changes Apr 18, 2023

View reviewed changes

first pass at fixing the dockerfile

aa2d300

myyk reviewed Apr 19, 2023

View reviewed changes

docker copy fix

a620a4e

forgot to pip install

1b4a376

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make this pip installable #82

Make this pip installable #82

winglian commented Apr 17, 2023

johnsmith0031 commented Apr 17, 2023

winglian commented Apr 17, 2023

winglian commented Apr 17, 2023

johnsmith0031 commented Apr 18, 2023

winglian commented Apr 18, 2023

myyk Apr 19, 2023

winglian Apr 19, 2023

winglian Apr 19, 2023

myyk Apr 20, 2023

johnsmith0031 commented Apr 20, 2023

myyk commented Apr 20, 2023

myyk commented Apr 20, 2023

nealchandra commented Apr 24, 2023 •

edited

Loading

tensiondriven commented Apr 25, 2023

tensiondriven commented Apr 25, 2023

urbien commented Apr 27, 2023 •

edited

Loading

johnsmith0031 commented Apr 28, 2023

Make this pip installable #82

Are you sure you want to change the base?

Make this pip installable #82

Conversation

winglian commented Apr 17, 2023

johnsmith0031 commented Apr 17, 2023

winglian commented Apr 17, 2023

winglian commented Apr 17, 2023

johnsmith0031 commented Apr 18, 2023

winglian commented Apr 18, 2023

myyk Apr 19, 2023

Choose a reason for hiding this comment

winglian Apr 19, 2023

Choose a reason for hiding this comment

winglian Apr 19, 2023

Choose a reason for hiding this comment

myyk Apr 20, 2023

Choose a reason for hiding this comment

johnsmith0031 commented Apr 20, 2023

myyk commented Apr 20, 2023

myyk commented Apr 20, 2023

nealchandra commented Apr 24, 2023 • edited Loading

tensiondriven commented Apr 25, 2023

tensiondriven commented Apr 25, 2023

urbien commented Apr 27, 2023 • edited Loading

johnsmith0031 commented Apr 28, 2023

nealchandra commented Apr 24, 2023 •

edited

Loading

urbien commented Apr 27, 2023 •

edited

Loading