-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
colpali v1.3 by AndrewOgn #427
base: main
Are you sure you want to change the base?
Conversation
To check out values for tests I use code examples from here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding_base.py
Outdated
Show resolved
Hide resolved
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding_base.py
Outdated
Show resolved
Hide resolved
fastembed/late_interaction_multimodal/late_interaction_multimodal_embedding.py
Outdated
Show resolved
Hide resolved
PAD_TOKEN = "<pad>" | ||
QUERY_MARKER_TOKEN_ID = [2, 9413] | ||
IMAGE_PLACEHOLDER_SIZE = (3, 448, 448) | ||
EMPTY_TEXT_PLACEHOLDER = np.array([257152] * 1024 + [2, 50721, 573, 2416, 235265, 108]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually token ids of the following string '<image>' * 1024 + '<bos>Describe the image.\n'
Could we make it nicer? It's not really readable at the moment
EVEN_ATTENTION_MASK is also not really readable, maybe instead of having this even_attention_mask
we could assign 1030
to a constant which seems to be a bit more reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's this tokens, right. But there are some slight difference between real text input and this placeholder.
As you can see here original preprocessor does not add '\n' token there, while it should be added everywhere else. So we should refactor logic of tokenize, trigger tokenizer each time (with constant output)
descriptions docs black
it's a draft of second iteration of work on colpali #394