Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for trust_remote_code / 8k context #410

Open
BlairSadewitz opened this issue Jul 19, 2023 · 10 comments
Open

support for trust_remote_code / 8k context #410

BlairSadewitz opened this issue Jul 19, 2023 · 10 comments

Comments

@BlairSadewitz
Copy link

Hello,

There are a number of models I'd like to try which require this. I know that I asked you about this in the past, and IIRC you mentioned that you removed it because you wanted to implement it properly.
In the interim, would you kindly instruct me on what I have to change in order to pass this flag to the appropriate call(s) (you don't have to do it for every conceivable situation/type of model, just for hf or hf_torch or whichever is necessary (16-bit, don't worry about loading in 8 or 4 bit) to load e.g. llama-based models, maybe falcon, etc. I'd just as happily patch transformers itself; whatever gets it to work. I'm mostly trying to load the models with increased context size.

Thanks.

@BlairSadewitz
Copy link
Author

Being able to use a monkey patch would be cool, too, but I assume that's even more work.

@BlairSadewitz
Copy link
Author

@henk717
Copy link
Owner

henk717 commented Jul 19, 2023

This is planned as a seperate addon but currently unfinished.

@BlairSadewitz
Copy link
Author

BlairSadewitz commented Jul 20, 2023

Oh, OK, fair enough. Whenever you have a spare moment, would you kindly tell me where in the code the call is which loads a 16-bit llama-based model (you know, that I'd download from HF) is so I could just rig it myself to work? Whenever I have the time, I will figure out how to use python to just tell me the line number. If that happens before you get around to replying to this, I'll close out the PR. It could be either the code in KoboldAI or the code in transformers itself, I don't care which.

@henk717
Copy link
Owner

henk717 commented Jul 20, 2023

The easiest way to do it is with our Basic HF backend since there it will be in the from_pretrained lines, in the main backend its quite complicated. The hold-up is that the Basic HF backend is unfinished and unstable, so your milage may strongly vary.

@BlairSadewitz
Copy link
Author

Hmm, yeah, I'm having some issues with it. :(

Check this out, though:
RoPE scaling got merged to transformers. Models don't have to be pretrained to use it, though apparently you lose accuracy if they aren't. Maybe you'd want to add support for this at some point? It works for gptneox, too, according to the chatter online.

huggingface/transformers@34d9409#diff-9ba75cc28be7924a2fc43de1d2c8c7779ad597129d33d1af39153951463cd0bc

Also, there's this:

https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

The patch is three lines. That code ameliorates the decrease in perplexity. Here's a colab:

https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=b80b3f37

@BlairSadewitz
Copy link
Author

I just noticed everything you merged. Thanks! I'd been hopping between forks, and this makes my life a lot easier.

@BlairSadewitz
Copy link
Author

In case you aren't aware, transformers now has support for rope scaling.

https://huggingface.co/docs/transformers/main/model_doc/llama#transformers.LlamaConfig

@henk717
Copy link
Owner

henk717 commented Aug 1, 2023

We automatically use rope scaling if its present in a models config. Manual control for it is planned.

@BlairSadewitz
Copy link
Author

Ooh, nice. That makes my life a lot easier.

Incidentally, I stumbled upon this:

https://github.com/jquesnelle/scaled-rope

Basically, it builds a wheel with the necessary code to support all these different scaling methods along with patch functions, e.g.

def patch_llama_for_linear_scaled_rotary_embeddings(model, scale):
from .LlamaLinearScaledRotaryEmbedding import LlamaLinearScaledRotaryEmbedding
for each in model.model.layers:
each.self_attn.rotary_emb = LlamaLinearScaledRotaryEmbedding(
each.self_attn.head_dim, scale=scale, device=each.self_attn.rotary_emb.inv_freq.device)

I found it because I had problems loading some different models because of the layers, which it takes care of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants