You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know that you think that Llava has been superseded but I think that it's still pretty good for captioning.
When I use your example script on mlx-community/llava-v1.6-34b-8bit, it warns that:
Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.47.
I have no idea what this means, but it'd be great if mlx-vlm could be tweaked to give the model what it wants.
The text was updated successfully, but these errors were encountered:
It's fixed and defined during model pretraining. If you change it, the model might perform poorly or not run because it affects number of image tokens.
I know that you think that Llava has been superseded but I think that it's still pretty good for captioning.
When I use your example script on mlx-community/llava-v1.6-34b-8bit, it warns that:
I have no idea what this means, but it'd be great if mlx-vlm could be tweaked to give the model what it wants.
The text was updated successfully, but these errors were encountered: