We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The native variable-size ViT seems to work better, as compared in the Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, has it been considered to introduce NaViT Vision Encoder in InternVL.
https://huggingface.co/HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Motivation
The native variable-size ViT seems to work better, as compared in the Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution, has it been considered to introduce NaViT Vision Encoder in InternVL.
Related resources
https://huggingface.co/HuggingFaceM4/siglip-so400m-14-980-flash-attn2-navit
Additional context
No response
The text was updated successfully, but these errors were encountered: