Replies: 1 comment 1 reply
-
Okay, so I tried this out in practice. I trained a LoRA on a GPTQ 4-bit quantized version of a model (which I loaded with the Transformers model loader), and then I made an 8-bit EXL2 from the original model, and applied the LoRA to the model in ooba-webui (which I loaded with the ExLlamaV2 loader). It did have some sort of effect, as the Deterministic output of the chat changed, but the personality of the result was very different to what the 4-bit GPTQ model has when the same LoRA is applied. Seeing as this technique didn't seem to work, I'd still like to know how to train a LoRA directly on an EXL2 model. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi there,
I've been making LoRAs for GPTQ models in ooba-webui for a while, using the Transformers model loader to load the GPTQ instead of using the ExLlamaV2 model loader, as the Training tab isn't compatible with models loaded by ExLlamaV2. I need to use a 4-bit GPTQ model, as the full sized model doesn't fit on my 24GB GPU.
Now, EXL2 is supposed to quantize better than GPTQ, and I have an EXL2 quantized version of the model, and I've read elsewhere here that EXL2 supports LoRAs. But how do I actually train (not just apply) a LoRA on my GPU using the EXL2 model? I can't load the EXL2 in ooba with the Transformers model loader, as that produces an error message. And I don't think axolotl supports training a model already in EXL2 format.
Should I just train a LoRA on the GPTQ version of the model, and then apply the result to the EXL2 model, even though they were quantized differently? Or have I misunderstood this process entirely?
Thanks for any advice.
Beta Was this translation helpful? Give feedback.
All reactions