Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLaVa-NeXT needs tweaking for v4.47 #106

Open
jrp2014 opened this issue Oct 25, 2024 · 3 comments
Open

LLaVa-NeXT needs tweaking for v4.47 #106

jrp2014 opened this issue Oct 25, 2024 · 3 comments

Comments

@jrp2014
Copy link

jrp2014 commented Oct 25, 2024

I know that you think that Llava has been superseded but I think that it's still pretty good for captioning.

When I use your example script on mlx-community/llava-v1.6-34b-8bit, it warns that:

Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.

I have no idea what this means, but it'd be great if mlx-vlm could be tweaked to give the model what it wants.

@Blaizzy
Copy link
Owner

Blaizzy commented Nov 16, 2024

Hey,

This is not an issue yet,

HF is changing their processor setup in the future v4.47.

All you need to do is patch the patch_size and vision_feature_select_strategy on the processor config.

@Blaizzy Blaizzy closed this as completed Nov 16, 2024
@Blaizzy Blaizzy reopened this Nov 16, 2024
@jrp2014
Copy link
Author

jrp2014 commented Nov 23, 2024

Is there some documentation for what the default / full vision_feature_select_strategy settings do?

What is the strategy for determining patch_size?

@Blaizzy
Copy link
Owner

Blaizzy commented Dec 21, 2024

Here you go: https://huggingface.co/docs/transformers/en/model_doc/llava#usage-tips

What is the strategy for determining patch_size?

It's fixed and defined during model pretraining. If you change it, the model might perform poorly or not run because it affects number of image tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants