-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make this pip installable #82
base: main
Are you sure you want to change the base?
Conversation
Thanks for doing this! I think I would also merge the cuda kernel used into this repo so that external dependency on GPTQ fork would be no longer needed. I think it would have better compatibility with main GPTQ. |
problem is the main gptq doesn't even keep the cuda kernel around anymore, they've hitched their horse to triton. delete kernel: |
alright, I've moved the quant_cuda into this repo. because of the way setuptools works, it's nearly impossible to make the cudaextension an extras without it being a separate external package, so it will get installed by default and triton is optional |
Thanks you for putting everything together! I made a PR to text-generation-webui, once it is merged I'll merge the PR into main. And I think we should adjust the Dockerfile for pip installable alpaca_lora_4bit as well for compatibility. |
I took a pass at updating the dockerfile, but I don't have cuda on my local machine so can't validate that it's totally correct, if someone else has a chance to look at the dockerfile and build/run it 🙏 |
Dockerfile
Outdated
@@ -61,14 +61,14 @@ RUN cd text-generation-webui-tmp && python download-model.py --text-only decapod | |||
# Get LoRA | |||
RUN cd text-generation-webui-tmp && python download-model.py samwit/alpaca7b-lora && mv loras/samwit_alpaca7b-lora ../alpaca7b_lora | |||
|
|||
COPY *.py . | |||
COPY src . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is quite right. I tried to build the image and run it to test it for you, but the symlinks below were not pointing to anything.
If they were ln -s ../alpaca_lora_4bit/autograd_4bit.py ./autograd_4bit.py
(remove 'src/') then they would have linked. So I recommend, either change the copy or change the symlinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whoops, COPY src .
didn't do what I thought 🤦
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dockerfile updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't be able to test that for a bit. I broke my machine pretty badly.
Thanks for everything done here! I think I'll temporarily keep it in winglian-setup_pip branch for those who want to use the pip installable version and the old version as main branch for compatibility with monkeypatch code in webui. May merge them if something changes in the future |
Still seeing an error when trying to run from Docker. I don't know what's going on enough to fix this, but it's not working by simply running Well anyway, I got my machine back up so that I can help test this.
|
I think that last change improved it, but there's still something off. I upgraded CUDA to 11.8 because I don't think 11.7 is working with my driver and it's on it's way out anyway.
|
I believe this branch is missing this commit 94851ce which at least for me causes a breaking error during build. I'm curious about the vision for this project, is the intent primarily to support folks who just want an easy way to run I think it is useful if that is the case, but for me this project would be even more valuable if it moved in the direction of this PR -- e.g. creating a core library which supports running inference and finetunes against multiple model types. This abstraction would then make it easy to plug this into the webui, or an API wrapper, or directly embed in some other python project. It seems hard to accomplish that goal without at least merging this PR back into the trunk. |
I am using it for a different purpose - To run local training at 4-bit via scripts in an automated and repeatable fashion. It's important to me that I be able to run it separately from text-generation-webui, so I'd hate to lose that functionality. |
I'm sure @johnsmith0031 would know better than me, but I expect that this project's functionality will eventually be exposed in HuggingFace Transformers or other large packages. This project is very cutting-edge, and does things that haven't previously been possible. I like where your intention is, and, I wouldn't want this project to get formalized to the point where it loses the agility needed to support features that are sometimes only a few days old. |
@johnsmith0031 have you seen LocalAI project? |
Thanks! Currently the hosting mode is compatible with text generation webui, which have better inference performance. Feel free to have a try! |
this is a pretty big refactor to:
pip install .[cuda]
orpip install .[triton]
There is some other cleanup that probably needs to be done, but I figured I should see if you want to go down this path. thanks!